LLMs Are Not Website Visitors: Why the LLM Info Page Trend Misreads the Architecture
Status: Original concept, first publication. Strategy Sandbox, jasonbarnard.com. Date: May 2026.
A trend has been building over the last six months: brands and consultants creating dedicated “LLM info pages” or “AI resources” pages on their websites, root-level URLs designed to tell visiting language models which other pages on the site they should pay attention to, what the brand stands for, and how the AI should reason about the brand’s offering. The intent’s reasonable, the impulse is sympathetic, and the mechanism the page assumes does not exist: LLMs aren’t website visitors, they’re reasoning systems querying an index, and the routing surface that would make a “for AI” page useful is not the surface AI engines actually use when they decide what to recommend.
This piece walks the architecture from the inference-time query down to what actually wins recruitment, and explains why the LLM info page trend is a category mistake that costs brands time, attention, and the canonical-signal clarity they were trying to build in the first place.
LLM info pages assume a routing mechanism that does not exist
The premise of the LLM info page is that an AI engine, when asked about your brand, visits your website, reads the “for AI” page first, and uses that page to navigate the rest of the site intelligently. None of that happens. AI engines don’t browse websites, they query indexes, and the indexes don’t honour intra-site routing instructions because intra-site routing is a website concept, not an index concept. The Web Index that Google and Bing maintain stores annotated passages from across the web, and the recruitment passes that fire on every assistive query draw from the index, not from any page that claims to be a guide to a specific domain.
This is the same architecture I named in Three Recruitment Logics, and the LLM info page trend is what happens when the architecture is misread at the brand-side end of the chain. The brand sees AI engines making recommendations and assumes those engines visit the brand’s website to inform those recommendations. The actual mechanism, query fan-out hitting a shared Web Index that the bot fills continuously, has no entry point at the website level for the brand to optimise against. There’s no door labelled “this way for AI” because there’s no AI walking up to the door.
Inference-time grounding goes through the shared Web Index and the Knowledge Graph, not the website
When you ask ChatGPT or Google AI Mode or Perplexity a question about a brand, the engine does one of two things. First check is parametric: does the LLM already hold the answer with sufficient confidence? If yes, it answers from weights, the brand’s website is irrelevant to that answer, and no page on the brand’s domain participates in the response. If the LLM is uncertain, or if it wants to corroborate before answering, or if it’s preparing for the likely follow-up turn, it triggers a query fan-out. The fan-out runs sub-queries against the Search Engine layer, which means against Google or Bing’s shared Web Index, and against the Knowledge Graph in parallel for entity-typed queries where one is available. We have direct evidence of the Knowledge Graph as a parallel grounding source from the Search Generative Experience era, where SGE was caught quoting and citing entries from Google’s Knowledge Graph rather than from web pages, confirming that the KG is queried alongside the index during the grounding pass. The engine returns ranked passages from the index and validated entity facts from the graph for each sub-query, and the LLM assembles the answer from what comes back.
At no point in that chain does the LLM “visit” the brand’s website to look at a page that tells it which other pages to read. The LLM isn’t on your website. It’s reading whatever the Search Engine and the Knowledge Graph surface for whatever sub-query has just fired. Your site, your sitemap, your navigation, and your “for AI” page are invisible to the inference-time retrieval pass unless they happen to win recruitment for a specific sub-query, and they win that recruitment on their merits like any other page in the index or any other entity in the graph. There’s no privileged access route, there’s no page-level routing layer, there’s the Web Index and the Knowledge Graph, the Search Engine reading from them, and the LLM consuming the passages and facts that come back.
When the LLM crawls the page directly, it sees raw HTML
There’s a second path the active engines sometimes take, where the LLM (or rather a bot acting on its behalf) fetches the page directly rather than reading the cached passage from the index. The lazy path takes what the index already has, the active path goes and gets the page itself. Both are real, and which one fires depends on the recency of the indexed page. The lazy path dominates when the index version is fresh enough that the engine has nothing to gain from going back to the source: recently crawled, recently annotated, the cached passage functionally current. The active path triggers when the index version is stale, when the engine needs a piece of the page the index didn’t capture, or when the engine wants to use page content differently from how the index has packaged it. The active path matters because of what it cannot see. Inference-time crawlers grab raw HTML, not a fully rendered DOM. Anything client-side rendered, anything lazy-loaded below the fold, anything fetched async after page load, is invisible to the bot doing the fetching. The irony writes itself: most LLM info pages are being built with the same modern frameworks the active-path bot can barely read, which means the page is invisible to the very mechanism it was built to influence.
This applies to the rest of the site too, but it lands hardest on the page that was specifically designed to be read by AI. If your “for AI” page is a single-page React application that hydrates client-side with the actual content load coming from a JavaScript fetch after page load, the bot fetching it sees a near-empty shell. The page looks correct in the browser, communicates well to humans, and tells the bot precisely nothing.
Training crawlers and inference fetches are different operations, both ignore routing instructions
Worth being explicit about the two crawler relationships, because they’re often conflated and both fail the routing assumption.
Training crawlers (GPTBot, ClaudeBot, PerplexityBot, and the rest) visit sites to gather training data for the next model refresh, and their crawl pattern is shaped by engineering priorities at both ends of the pipeline. Upstream of the crawl, the engineering teams selecting the next training corpus tell the crawlers what they’re looking for: we need to fill this gap, we need denser coverage in this domain, we need to strengthen E-E-A-T signals on this entity type. The crawlers prioritise content that matches those priorities and ignore content that doesn’t, even when it’s well-marked-up and prominently placed on the brand’s site, because their job isn’t to ingest the web indiscriminately but to fill specific gaps the engineering team has named. Downstream of the crawl, what comes back goes through the engineer-curation passes that produce the actual training corpus, which is the periodic layer I covered in Two Annotation Layers. The LLM info page might get crawled by these bots when it happens to fit the upstream priorities, but it gets crawled like any other page, with no routing privilege. The engineering teams selecting training corpora aren’t reading brand-supplied guidance about which pages on a domain should be weighted, they’re directing the crawlers based on the model’s internal priorities, and they’re doing the curation themselves.
Inference-time fetches are the second operation, separate from training crawls, fired in real time when an LLM needs to ground a specific claim. These fetches go through the Search Engine partnership for efficiency in most cases, which is why the rhetorical question “why crawl at all if they have an agreement with Google or Bing?” lands. The answer is that training crawls and inference fetches serve different layers, and neither of them treats the LLM info page as a routing instruction. Both treat it as a page to potentially ingest, and ingestion follows the same recruitment rules every other page is subject to: the page wins or loses on its content, its annotation, its corroboration, and its currency, not on the label “for AI.”
A standalone root-level info page violates single source of truth
This is the strongest part of the argument and it deserves to be named directly. The whole point of the website Entity Home and the pillar-page architecture I covered in the SEL series on the Entity Home is that one authoritative locus accumulates corroboration from across the rest of the site and the wider web. A standalone root-level “for AI” page either duplicates the canonical statement that already lives on the About page or the website Entity Home, in which case it dilutes the canonical signal by giving the algorithms a second statement of the same thing to reconcile against, or it claims hierarchical authority over the rest of the site that the algorithms don’t honour because no such routing layer exists.
Either way, you lose. Duplication weakens the signal because the algorithms reconcile across multiple statements and end up with lower confidence in any one of them, and authority claims that the architecture doesn’t honour are simply ignored. The page sits at root level outside the context-rich hierarchy that pillar pages live inside, with no parent context, no sibling context, no breadcrumb context, no internal-link context tying it into the rest of the site’s narrative. It has to manufacture its own context, which is exactly the failure mode pillar pages were designed to solve. A pillar page on “how our methodology works” inherits context from the parent service page, links to case-study evidence on sibling pages, and accumulates internal links from blog posts that reference it. A root-level “for AI” page inherits nothing, links to whatever the brand picked, and accumulates internal links only if the brand goes back through the site to insert them, at which point the brand has duplicated the work the pillar page would have done anyway, with no canonical anchor to land it.
The vanish test separates canonical pages from redundant ones, and the LLM info page fails it
Not every standalone page is a single-source-of-truth violation, and the architecture rewards some standalone pages handsomely when they earn their place. The diagnostic that separates pages worth building from pages worth skipping is the vanish test: if the page disappeared overnight, would the site lose a unique citeable source of truth for a question users and machines actually ask? If the answer’s yes, build it. If no, the page is redundant, and redundant pages dilute the canonical signal regardless of what label sits on them.
Three categories sit underneath the test. Bad guide pages are thin “for AI” summaries with no unique information, weak or absent authorship, no corroboration backbone, and no structural integration into the rest of the site, and they fail the vanish test cleanly because nothing of substance disappears with them. Legitimate canonical pages are tightly-scoped explainers on recurring high-value questions where centralising the answer serves humans and machines simultaneously, things like pricing methodology, return policy, clinical evidence, regulatory disclosures, and official company profiles with legal name and founder and category. Distributed architecture, the preferred default for most pages on most sites, is the pattern where the homepage, the website Entity Home, product pages, author pages, policy pages, and supporting articles each do one job well and reinforce each other through internal linking and consistent schema, with no single page carrying the load alone.
The LLM info page fails the test because the work it claims to do is either already being done by the canonical pages it would duplicate, or it isn’t actually centralising a unique source of truth for any question worth asking.
Independent quantified evidence: LLMs.txt scores 2.0 of 10 across 24 ranking factors
Cyrus Shepard’s AI Citation Ranking Factor Analysis, surfaced on LinkedIn and based on a meta-analysis of 55 experiments, patents, and case studies on what causes AI engines to cite content, scores LLMs.txt at 2.0 out of 10. Dead last across 24 factors. The gap between LLMs.txt and the next-lowest factor (Domain Authority at 5.0) is larger than the gap between Domain Authority and the highest-scoring factor (URL Accessibility at 9.5), which means the file specifically designed to speak to LLMs is performing worse against the gradient of every other factor on the chart than every other factor performs against the highest one. Independent quantified third-party meta-analysis confirming the central claim of this piece: a brand-supplied “for AI” file produces near-zero recruitment effect.
The structural finding in the same chart needs reading carefully, because the surface story is misleading. The top six factors (URL Accessibility 9.5, Search Rank 9.4, Fan-out Rank 9.3, Preview Control 9.2, Query-Answer Match 9.2, Intent-Format Match 9.0) are all infrastructure-and-substrate signals: how the bot finds the page, how the page renders, whether the index returns it cleanly on fan-out. These are the gates I have been calling DSCRI for years, the DSCRI infrastructure layer of the AI Engine Pipeline (Discovered, Selected, Crawled, Rendered, Indexed) that I covered in detail in the SEL piece on the five infrastructure gates behind crawl, render, and index. The reason these factors score highest is not that retrieval is the dominant driver of AI recommendation. It is that most brands are still failing the absolute infrastructure gates that come before competition even starts, which means the ceiling on every other factor is being held down by the floor that nobody has fixed. The chart is a snapshot of an industry where the infrastructure half is still where most of the variance lives, because most brands have not done the foundational work that makes them legible to the bots in the first place.
The Domain Authority finding sharpens the diagnosis further. Domain Authority sits at 5.0 on the chart, well off the pace, and Domain Authority is the brand-trust signal that lives upstream of the DSCRI gates: it is the input that determines how readily the engines invest crawl budget, rendering budget, and indexing budget on the brand’s content in the first place, as Fabrice Canel of Microsoft Bing confirmed when he described how the system allocates bot resources based on expected return weighted by publisher entity authority. A 5.0 score on Domain Authority across 55 experiments tells you that brands aren’t just failing rendering and crawl-readiness, they’re failing the brand-trust foundation that determines how generously the engines invest in handling the brand’s substrate at every gate downstream of it. The infrastructure failure isn’t only mechanical; it’s compounded by an entity-trust failure that the engines see before the bot has even decided how thoroughly to crawl. Brands without Domain Authority enter DSCRI at a disadvantage; brands without DSCRI fitness compound the disadvantage further; and the chart is registering both failures simultaneously, which is why the infrastructure-related factors sit at the top and the competitive-phase factors below them.
The real competitive ground sits below the table-stakes infrastructure layer, in the ARGDW gates: Annotated, Recruited, Grounded, Displayed, Won, which I covered in the SEL piece on the five competitive gates hidden inside rank and display. Once a brand has cleared DSCRI, the question of who wins the recommendation moves to entity-level work - Knowledge Graph corroboration, recruitment logic per Trinity component, the Framing Gap supplied through pillar pages and the corroboration backbone, and the question of which assistive engines weight which Trinity component most heavily, which I treated as a brand-side optimisation framework in Search, Answer, and Assistive Engine Optimization: a 3-part approach for SEL in April 2025. None of that is visible in a meta-analysis whose top scores are dominated by brands failing at the foundational layer. The chart says serving the bots is non-negotiable, which is correct, and which is also exactly the level at which the LLM info page trend is misallocating brand-side effort. The brand that builds an LLM info page instead of fixing its rendering, its crawl-readiness, its index-cleanliness, and then doing the entity work that drives ARGDW recruitment, is treating the visible infrastructure gap as the competitive opportunity, when it is the floor everyone needs to be standing on before the actual competitive game begins.
Pillar pages do this work natively, the info page tries to fake it
The work the LLM info page is trying to do is the work pillar pages do natively, with the difference that pillar pages do it embedded in the context they need to make the work compound. Pillar pages live inside a topical hierarchy, link out to evidence pages and case studies, get linked back from blog posts that reference them, carry schema markup that names the entity and the topic and the relationship to the brand, and accumulate corroboration from third-party citations across the wider web that point to them as the canonical reference for the topic the brand owns. This is what wins recruitment across all three components of the Algorithmic Trinity, and it’s what wins recruitment across the question of which assistive engines weight which Trinity component most heavily.
The LLM info page does none of this. It sits alone at root level, points outward to other pages that may or may not have their own context, and offers the algorithms nothing they aren’t already getting from a well-structured About page, a clearly-marked-up website Entity Home, and a set of pillar pages that carry their own context and corroboration weight. Make your About page clear, make every page on your site clear, prioritise information through site architecture and internal linking, and you produce the recruitment outcome the LLM info page is reaching for, without violating single source of truth and without introducing a parallel canon the architecture cannot honour.
The answer to optimising for AI is the answer it has always been
The trend rests on a category mistake, and the category mistake repeats a pattern from earlier eras of the same conversation. Every time a new AI surface emerges, brands look for a new optimisation artefact specific to that surface. For me, this is where the discipline of returning to the architecture matters most, because the architecture has been stable underneath for years. The answer to “how do we optimise for AI Assistive Engines?” is the answer to “how do we make the brand’s entity legible across the Algorithmic Trinity?”, which is the answer to “how do we win recruitment across knowledge graphs, LLMs, and search engines?”, which is the answer to “how do we feed the shared Web Index continuously with corroborated, well-annotated, context-rich content from a single source of truth?”
That answer doesn’t change because a new artefact gets invented at the brand-side end. Pillar pages, the website Entity Home, structured data, the corroboration backbone across third-party sources, the operational practice of feeding the bot continuously through the push layer entry modes I covered in the SEL piece on the push layer. All of this work was already the answer five years ago, was already the answer when I formalised AEO in 2017, and is the answer now. The LLM info page isn’t new advice for the new era, it’s old advice misapplied, which makes it worse than no advice, because it consumes attention and effort that should be going to the work that actually compounds.
Make every page clear, prioritise information through site structure and internal linking and schema markup, feed the bot to your pillar pages, maintain a single source of truth at the website Entity Home, and let the canonical signal accumulate. The architecture rewards that work consistently and rewards no other work, including the work of building a parallel canon at root level for an audience that doesn’t consume content the way the page assumes.