Name: w2agent
Author: w2agent

How AI Uses Structured Data

When an AI model encounters a web page, it needs to answer basic questions: What is this page about? Is this a product, an article, a person, an organization? What are the key facts?

Without structured data, the AI has to infer these answers from the page's text content — a process that's error-prone and context-dependent. With schema.org JSON-LD, the answers are explicit and machine-readable. The AI doesn't have to guess that "$29.99" is a price — it's labeled as schema:price.

Which Types Matter Most

Schema.org has 800+ types, but AI models primarily use a small subset. Here are the types that have the most impact on AI readiness:

Article / BlogPosting

Use on: Blog posts, news, guides, tutorials

Helps AI identify the article's headline, author, publish date, and topic. Critical for content sites.

{
  "@type": "Article",
  "headline": "How to Configure robots.txt",
  "author": { "@type": "Person", "name": "Jane Doe" },
  "datePublished": "2025-01-15"
}

Organization

Use on: Homepage, about page

Establishes your brand identity for AI — name, logo, social profiles, contact info. Helps AI accurately attribute content.

{
  "@type": "Organization",
  "name": "Acme Corp",
  "url": "https://acme.com",
  "logo": "https://acme.com/logo.png"
}

Product

Use on: E-commerce product pages

Price, availability, reviews — the facts AI shopping assistants need to make recommendations.

{
  "@type": "Product",
  "name": "Widget Pro",
  "offers": {
    "@type": "Offer",
    "price": "29.99",
    "priceCurrency": "USD"
  }
}

FAQPage

Use on: FAQ sections, support pages

Question-answer pairs are directly consumable by AI assistants. The most AI-friendly schema type.

{
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is llms.txt?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "A standard for..."
    }
  }]
}

SoftwareApplication

Use on: Tool/app landing pages

Tells AI what your software does, what platform it runs on, and whether it's free. Essential for developer tools.

{
  "@type": "SoftwareApplication",
  "name": "w2agent",
  "applicationCategory": "DeveloperApplication",
  "operatingSystem": "Any"
}

BreadcrumbList

Use on: Any page with navigation hierarchy

Helps AI understand where a page sits in your site structure — crucial for large sites.

{
  "@type": "BreadcrumbList",
  "itemListElement": [{
    "@type": "ListItem",
    "position": 1,
    "name": "Docs",
    "item": "https://example.com/docs"
  }]
}

Implementation: JSON-LD

Always use JSON-LD format (not Microdata or RDFa). JSON-LD is embedded in a <script type="application/ld+json"> tag in your page's head or body. It's easier to maintain, doesn't interleave with your HTML, and is the format recommended by Google and preferred by AI systems.

You can include multiple JSON-LD blocks on a single page — one for the Organization, one for the Article, one for BreadcrumbList. They're independent and don't need to reference each other.

What AI Actually Reads

Not all schema.org properties are equally useful to AI. Focus on these high-value properties:

→ name/headline: The primary identifier — what is this thing?
→ description: A concise summary AI can use directly in responses.
→ author/publisher: Attribution and credibility signals.
→ datePublished/Modified: Freshness — AI prefers recent content.
→ price/availability: For products — the facts users ask AI about.

Auto-Generate Structured Data

w2agent audits your existing structured data and generates missing schemas based on your page content. It detects page types (article, product, FAQ) and creates the appropriate JSON-LD.

Testing Your Structured Data

Before relying on schema.org markup to improve AI readiness, verify it's valid and parseable. Invalid JSON-LD is silently ignored — it doesn't show errors on the page, so issues are easy to miss.

# Extract and validate JSON-LD from a page
curl -s https://your-site.com/blog/post | \
  grep -o '<script type="application/ld+json">.*</script>' | \
  python3 -m json.tool

# Or use sed + jq (works on macOS and Linux)
curl -s https://your-site.com/ | \
  sed -n 's/.*<script type="application\/ld+json">\(.*\)<\/script>.*/\1/p' | \
  jq .

Google's Rich Results Test and Schema.org's validator are the authoritative tools for deeper validation. The w2agent audit checks for the presence and syntactic validity of JSON-LD on every page it scans.

Impact on AI Responses

Sites with complete schema.org markup appear more authoritatively in AI-generated responses. When ChatGPT or Perplexity summarizes your product, it pulls structured fields first — price, availability, description — before falling back to page text. The difference is accuracy: text parsing introduces errors; structured data doesn't.

Schema.org works at the page level. For site-level discovery, pair it with llms.txt (content index) and agent-card.json (capability declaration) for a complete AI-readiness stack. The w2agent score measures all three layers together.

What is llms.txt? — Site-level content indexing that works alongside page-level Schema.org markup.
agent-card.json — Capability declaration for agents, completing the three-layer AI-readiness stack.
AI Readiness Audit — How the w2agent score measures your structured data implementation.

Score your site now

Get your free w2agent score and generate the files your site needs.

Get Your Score

Schema.org for AI