The Problem
When a large language model like GPT-4, Claude, or Gemini needs to understand your website, it has to crawl and parse your HTML — just like a search engine. But unlike search engines, LLMs don't just need keywords and links. They need to understand your site's structure, purpose, and content hierarchy.
HTML pages are designed for human browsers: navigation menus, sidebars, footers, cookie banners. An LLM parsing your homepage sees all of this noise alongside your actual content. llms.txt solves this by giving AI models a clean, structured overview.
The Format
llms.txt is a plain-text file served at /llms.txt on your domain. It uses Markdown-like formatting with a simple structure:
# Site Name > Brief description of what this site is about ## Section Name - [Page Title](https://example.com/page): Short description of the page - [Another Page](https://example.com/other): What this page covers ## Another Section - [API Docs](https://example.com/docs/api): API reference documentation
Line 1: H1 heading with your site or project name.
Line 3: Blockquote with a one-line description.
Sections: H2 headings that group related pages.
Links: Markdown links with a colon-separated description.
llms.txt vs llms-full.txt
The specification defines two files:
llms.txt
A concise index of your site — titles, URLs, and brief descriptions. Typically 1-5 KB. Designed for AI models to quickly understand what your site offers and find relevant pages.
llms-full.txt
The full content of every page, concatenated into a single markdown file. Can be 100 KB+. Designed for AI models that want to ingest your entire site's content in one request.
Who Uses It
The llms.txt proposal was created by Jeremy Howard and has been adopted by a growing number of sites. Notable early adopters include documentation sites, developer tools, and API providers who want their content accurately represented by AI models.
AI-powered tools like Cursor, Cline, and AI coding assistants use llms.txt to understand project documentation. Search-augmented AI systems check for llms.txt before falling back to HTML parsing.
Common Mistakes
Serving HTML instead of plain text
Set Content-Type to text/plain. Some web servers default to text/html for unknown extensions.
Including every page
llms.txt should be curated — include your most important pages, not an exhaustive sitemap.
Missing descriptions
Each link should have a description after the colon. Without it, AI models can't assess relevance without fetching the page.
Stale content
Update llms.txt when you add or remove pages. Automate generation from your CMS or build pipeline.
Blocking AI crawlers in robots.txt
If your robots.txt blocks GPTBot or ClaudeBot, they can't fetch your llms.txt either.
Generate Yours Automatically
w2agent crawls your site, extracts content, and generates both llms.txt and llms-full.txt automatically. It handles the formatting, deduplication, and content extraction so you don't have to write it by hand.
Real-World Examples
Several early-adopter categories are leading llms.txt adoption. Documentation sites see the most benefit: developer tools like Tailwind CSS, Supabase, and Vercel use llms.txt so that coding agents can look up API syntax without parsing HTML docs. The format has spread to API providers, open source projects, and SaaS products that want accurate AI representation.
A minimal but effective llms.txt for a SaaS product looks like this:
# Acme > Project management software for engineering teams ## Core Features - [Dashboard](https://acme.com/docs/dashboard): Overview of projects and team activity - [API Reference](https://acme.com/docs/api): REST API for automating workflows - [Webhooks](https://acme.com/docs/webhooks): Real-time event notifications ## Pricing - [Plans](https://acme.com/pricing): Free, Pro ($12/user/mo), Enterprise
Notice the structure: a one-line blockquote description, sections that mirror your nav, and every link has a colon-separated description. Agents use the descriptions to decide which pages to fetch — so quality descriptions reduce unnecessary crawling.
How llms.txt Relates to Other AI Files
llms.txt is one layer of a complete AI-readiness stack. It works alongside other files, each serving a different purpose:
- →robots.txt — controls which AI crawlers can access your site at all. llms.txt is useless if AI crawlers are blocked before they can fetch it.
- →agent-card.json — describes your site's capabilities for AI agents (APIs, tools). llms.txt describes your site's content.
- →Schema.org JSON-LD — structured data embedded in your HTML pages. Works at the page level; llms.txt works at the site level.
A site with all three layers scores significantly higher on the w2agent audit because each file signals AI-readiness to a different part of the discovery pipeline.
Related Articles
- robots.txt for AI Bots — Control which crawlers can access your site before they reach llms.txt.
- agent-card.json — Declare your site's API capabilities for AI agents.
- Schema.org for AI — Structured data that works at the page level, complementing your llms.txt.
Score your site now
Get your free w2agent score and generate the files your site needs.
Get Your Score