What is llms.txt?

A proposed standard for making websites readable by large language models. Think of it as robots.txt, but for AI understanding rather than crawl access.

The Problem

When a large language model like GPT-4, Claude, or Gemini needs to understand your website, it has to crawl and parse your HTML — just like a search engine. But unlike search engines, LLMs don't just need keywords and links. They need to understand your site's structure, purpose, and content hierarchy.

HTML pages are designed for human browsers: navigation menus, sidebars, footers, cookie banners. An LLM parsing your homepage sees all of this noise alongside your actual content. llms.txt solves this by giving AI models a clean, structured overview.

The Format

llms.txt is a plain-text file served at /llms.txt on your domain. It uses Markdown-like formatting with a simple structure:

# Site Name

> Brief description of what this site is about

## Section Name
- [Page Title](https://example.com/page): Short description of the page
- [Another Page](https://example.com/other): What this page covers

## Another Section
- [API Docs](https://example.com/docs/api): API reference documentation

Line 1: H1 heading with your site or project name.

Line 3: Blockquote with a one-line description.

Sections: H2 headings that group related pages.

Links: Markdown links with a colon-separated description.

llms.txt vs llms-full.txt

The specification defines two files:

llms.txt

A concise index of your site — titles, URLs, and brief descriptions. Typically 1-5 KB. Designed for AI models to quickly understand what your site offers and find relevant pages.

llms-full.txt

The full content of every page, concatenated into a single markdown file. Can be 100 KB+. Designed for AI models that want to ingest your entire site's content in one request.

Who Uses It

The llms.txt proposal was created by Jeremy Howard and has been adopted by a growing number of sites. Notable early adopters include documentation sites, developer tools, and API providers who want their content accurately represented by AI models.

AI-powered tools like Cursor, Cline, and AI coding assistants use llms.txt to understand project documentation. Search-augmented AI systems check for llms.txt before falling back to HTML parsing.

Common Mistakes

Serving HTML instead of plain text

Set Content-Type to text/plain. Some web servers default to text/html for unknown extensions.

Including every page

llms.txt should be curated — include your most important pages, not an exhaustive sitemap.

Missing descriptions

Each link should have a description after the colon. Without it, AI models can't assess relevance without fetching the page.

Stale content

Update llms.txt when you add or remove pages. Automate generation from your CMS or build pipeline.

Blocking AI crawlers in robots.txt

If your robots.txt blocks GPTBot or ClaudeBot, they can't fetch your llms.txt either.

Generate Yours Automatically

w2agent crawls your site, extracts content, and generates both llms.txt and llms-full.txt automatically. It handles the formatting, deduplication, and content extraction so you don't have to write it by hand.

Audit your site now

Get a free AI readiness score and generate the files your site needs.

Start Free Audit
What is llms.txt? | w2agent