A 2026-emerging convention — proposed by Jeremy Howard at fast.ai — that publishes a curated markdown index of a site's most important content at /llms.txt for AI answer engines to read.

How is the file structured?

An H1 site title, a blockquote summary, H2 section headings, and link lists — distinct from sitemap.xml's exhaustive URL list.

Sites that adopt llms.txt tend to see faster AI-answer-engine indexing and higher cite rates inside the first 30 days.

llms.txt Conventions and Emerging 2026 Practice

Daniel Medina Founder, No Brainer Media · May 26, 2026 · 7 min read

ai-native-websites

sitemap.xml tells a crawler where every page is. llms.txt tells an answer engine which fifteen pages actually matter — and as of 2026, almost none of your competitors have published one.

llms.txt is a small markdown file at your site root that hands AI answer engines a curated index of your best content. It is a 2026-emerging convention, not a ratified standard — proposed by fast.ai's Jeremy Howard, read today by Perplexity and Claude, and not yet by Google. Publishing it is a low-cost forward bet: the file is cheap to generate from your content graph, and early movers tend to capture an indexing premium while the convention is still young. Here is the structure, the build-time emission, the failure modes, and exactly who reads it today.

What llms.txt is, and what it isn't

The parent pillar, AI-Native Website in 4 Weeks, names llms.txt as the fourth of the five AI-native properties. Three orientations before the conventions:

Curated, not exhaustive. sitemap.xml indexes every URL you want crawled; llms.txt indexes the handful you consider most important for AI retrieval. A 200-URL sitemap plus a 15-URL llms.txt is the right shape for a service business. A 200-URL llms.txt dilutes the signal it exists to concentrate.

Markdown, not XML. The convention picks markdown because its audience is language models, and markdown is the medium they are most fluent in. Rendering it as XML would defeat the purpose.

Emerging, not ratified. The proposal dates to late 2024. Adoption is partial: Perplexity reads it, Claude reads it when its retrieval surfaces a candidate URL, and Google's AI Overviews has not committed. This is a forward-looking bet, not a defensive necessity.

The file structure — H1, blockquote, H2 sections, link lists

The canonical structure, per the proposed convention:

# No Brainer Media

> Marketing operations that close revenue — not raw clicks. Built for owners who reject corporate nonsense. We build, wire, and run the marketing stack — AI-native website, CRM automation, content engine, server-side attribution — for service businesses with material paid-media spend.

## Services

- [Google Ads API Integration](https://nobrainermedia.com/services/google-ads-api-integration/): Server-side conversion attribution via the Google Ads API. Four-week implementation.
- [AI Native Website Production](https://nobrainermedia.com/services/ai-native-website-production/): Astro 6 + Cloudflare Workers static stack. Sub-1.5s LCP. Four-week build.

## Pillars

- [Server-Side Conversion Attribution](https://nobrainermedia.com/blog/server-side-conversion-attribution/): Closed revenue, not raw clicks. Why client-side pixels broke after 2021 and what to build instead.
- [AI-Native Website in 4 Weeks](https://nobrainermedia.com/blog/ai-native-website-in-4-weeks/): The boring stack that prints. Astro 6 + Cloudflare Workers + content-as-code + schema + llms.txt.

## Case studies

- [UAC — Roofing](https://nobrainermedia.com/case-studies/uac/): Server-side attribution for a contractor — closed revenue fed straight to Google's bidder.

## Optional

- [Blog index](https://nobrainermedia.com/blog/): Full essay and recipe library.
- [llms-full.txt](https://nobrainermedia.com/llms-full.txt): Full-content variant for retrieval systems that ingest body text, not just URLs.

Six structural rules:

One H1, the site name — short, brandable, recognizable.
A blockquote summary right after the H1 — your elevator pitch, written as the 100–200 words an LLM might quote verbatim when asked what you do.
H2 sections group the links — Services, Pillars, Case studies, About, Optional. Loose conventions; pick what fits.
Each link is a markdown link plus an optional one-sentence description giving the model context on what the URL is.
An "Optional" section signals content the retrieval system can deprioritize if it needs to be selective.
Keep it short — 10–25 links covers most service sites; 100+ defeats the point.

The llms-full.txt variant — full content, not just URLs

A second proposed file, /llms-full.txt, includes the body text of the linked pages rather than just URLs — same H1, blockquote, and sections, but each link is followed by the full markdown of the page. Three positions are defensible: publish both and let the retrieval system choose (the NBM default); publish llms.txt only and let engines fetch full content per URL; or publish neither until a major engine commits (the conservative call, reasonable if you already rank strongly). Cost is low either way, because both files emit at build time.

Build-time emission from the content graph

Never hand-author these files. A build script reads the content graph for pages flagged include_in_llms_txt: true and emits the markdown:

// scripts/build-llms-txt.ts
import { readContentGraph } from './lib/content-graph'
import { writeFileSync } from 'fs'

const graph = readContentGraph('_meta/index.db')
const services = graph.where({ page_type: 'service_detail', include_in_llms_txt: true })
const pillars = graph.where({ page_type: 'pillar', include_in_llms_txt: true })

const llmsTxt = [
  '# No Brainer Media', '',
  '> Marketing operations that close revenue — not raw clicks. ...', '',
  '## Services', '',
  ...services.map(s => `- [${s.title}](${s.canonical_url}): ${s.summary}`), '',
  '## Pillars', '',
  ...pillars.map(p => `- [${p.title}](${p.canonical_url}): ${p.summary}`),
].join('\n')

writeFileSync('dist/llms.txt', llmsTxt)

Three patterns earn naming: the include_in_llms_txt: true frontmatter flag lets you curate per page (off by default for blog posts, on for services and pillars); descriptions come from the existing summary: field, not a separate one — single source of truth; and the same script emits both llms.txt and llms-full.txt, the latter appending each page's body inline.

Want llms.txt published correctly with the rest of the AI-native stack? Talk to the team. →

Discoverability — root, robots.txt, sitemap.xml

llms.txt is discoverable in three places, and publishing all three gives the fastest indexing. First, the site root — /llms.txt and /llms-full.txt serve there automatically via the Cloudflare Workers Static Assets binding. Second, a robots.txt reference using the convention's proposed directive:

User-agent: *
Allow: /
Sitemap: https://nobrainermedia.com/sitemap.xml
LLMs-txt: https://nobrainermedia.com/llms.txt
LLMs-txt: https://nobrainermedia.com/llms-full.txt

The LLMs-txt: directive is not yet recognized everywhere but does no harm. Third, a sitemap.xml entry for /llms.txt, which surfaces it to crawlers that treat the sitemap as their primary discovery surface.

The four failure modes

Broken links. The file gets indexed eagerly; dead URLs cost credibility. Build-time generation guarantees canonical URLs, and CI tests every one with curl -fsS before deploy.
Staleness. Hand-authored files drift as the catalog changes. Generate on every deploy.
Bloat. 100+ entries defeat curation. Keep the include_in_llms_txt flag deliberate.
Canonical drift between llms.txt and sitemap.xml. Trailing-slash, www, and http/https mismatches. Generate both from the same canonical-URL helper, never independently.

Who reads it as of 2026

Reads it today: Perplexity (publicly committed), Claude (when retrieval surfaces a candidate URL), fast.ai's own tooling, and a growing list of documentation crawlers. Hasn't committed: Google AI Overviews, ChatGPT search (Bing-backed), and Gemini — all of which lean on traditional crawl plus structured data. Likely future readers, given the trajectory: Apple Intelligence's web layer, the next Copilot release, and any new entrant to the category. Even if only one more major engine adopts it, the early-mover indexing premium tends to last long enough to matter.

Closing

llms.txt is the fourth AI-native property because answer engines are increasingly the first surface a buyer meets, and this file makes you legible to that surface in a way sitemap.xml cannot. The convention is new, the adoption partial, the cost small, the upside asymmetric. The boring stack that prints publishes both files, generated at build time from the content graph, referenced from robots.txt and sitemap.xml, and served at the root. Variations should be justified at code review.

Ready to ship the llms.txt layer correctly in week 3 of the cluster build? Book a 30-minute call →

#spoke
#blog-detail
#ai-native
#llms-txt
#ai-answer-engines
#discoverability

Build it yourself?

Get the kit, not just the theory.

We'll send the build checklist behind this post — and the next pillar when it ships. One email, no drip sequence. Unsubscribe in one click.

llms.txt Conventions and Emerging 2026 Practice — file structure, entity declarations, discoverability