Topical Answer Adoption: How to Measure Which Brand the AI Engine Learned From (and Solve TOFU Attribution in the AI Era)
A Strategy Sandbox piece by Jason Barnard, 25 May 2026
Status: Original concept, first publication.
For twenty years, top-of-funnel attribution was the measurement problem nobody could solve, and the absence of a solution wasn’t for lack of trying: marketing mix modelling tried econometric regression, view-through tracking tried impression credit, last-click reporting gave up and credited only the final interaction, and multi-touch attribution spread the credit fractionally across an unobservable journey. None of it worked, because the buyer’s journey from awareness to conversion was structurally invisible to whoever wanted to measure it, and the credit assignment was best-effort guesswork dressed up as data. The AI era inherited the problem and made it worse before it made it solvable: the buyer’s journey moved off the brand’s measurable surfaces and onto the engine’s conversational ones, the journey compressed from weeks to minutes, and the engine doing the recommending didn’t expose its reasoning. Triple Opacity made the AI-era version of the problem worse than the search-era version it inherited.
Then something shifted in 2026. The same property that made AI-era top-of-funnel attribution structurally harder also made one operational solution possible. The engine’s answer is now legible text rather than a ten-blue-link ranking, the candidate corpora that could have taught the engine are scrapeable rather than locked inside ad-network black boxes, and the comparison between the two can be scored with the same generation of AI tooling that produced the recommendation in the first place. The methodology has been hiding in plain sight, and the one I’m publishing here is the operational instrument the field will need to refine collectively.
The measurement is topical answer adoption. The scoring mechanism is corpus similarity. The position where it operates is the twigs of the Funnel Query Pathway.
A refinement to the Funnel Query Pathway metaphor
Earlier in 2026 I described the Funnel Query Pathway as an inverted tree, with conversion at the root and awareness branches projecting upward. The metaphor was right at the root level and underspecified at the system level, and the refinement matters because the methodology in this piece operates at one specific layer of the structure.
The brand’s complete strategic landscape is an orchard. Many trees, deliberately planted, cultivated across seasons. Each tree is one cohort-with-intent intersection: XL men buying a red shirt is one tree, luxury travellers booking Bali is another, parents shopping for kids’ summer kit is a third. The orchard grows as the brand cultivates more cohort-intent intersections, and the cultivation analogy is operationally honest because brands running this methodology are building an asset that compounds across seasons rather than across weeks.
Each tree has three parts. The trunk is the conversion node, one representative branded BOFU query that stands for the buying moment for that cohort-with-intent intersection. In the Kalicube Proâ„¢ tracking implementation, one trunk query represents the entire tree as the period-over-period read: the FAQ page on the brand’s site can carry as many variants of the BOFU query as the brand wants, but the methodology tracks one trunk query per tree because the trunk is the structural read on whether the tree is producing fruit. The branches are the MOFU evaluation queries, where the engine reasons at the brand-versus-competitor layer. The twigs are the TOFU awareness queries, the most numerous part of the tree and the layer where the engine reasons at the topical level rather than at the brand level. Topical answer adoption operates on the twigs.
The ground of the orchard is the brand itself. Not the marketing layer, not the website, but the business as a whole: the operational substrate that gives the trees something to grow from, and the place where the apples fall when the trees bear fruit. The ground is what makes the orchard productive over time, and the brand that lets its ground go fallow watches its trees die regardless of how well its branches were optimised.
This piece treats one measurement at one position: the twigs.
The diagnostic question topical answer adoption answers
At the twig layer of any tree, the buyer hasn’t named a brand yet and isn’t asking about brands. The buyer is asking the topical question that sits upstream of the buying motion: “can men wear red shirts to work” is the twig that routes toward Uniqlo’s red-shirt trunk. “What’s the best beach in Bali for honeymooners” is the twig that routes toward a luxury hotel’s conversion node. The engine answering those twig queries isn’t reasoning about brands. The engine is reasoning about the topic, drawing on whatever content the field has produced about the topic, and the brand whose content the engine reasoned from is the brand that earned the awareness-layer position.
The diagnostic question at the twig layer is therefore different from the diagnostic question at the trunk or the branches. At the trunk, the question is “did the engine surface us correctly when our buyer named us.” At the branches, “did the engine recommend us when our buyer was comparing us against competitors.” At the twigs, the question is harder: “whose content did the engine learn from when it answered the topical question our buyer asked before they had a brand in mind.”
That’s the top-of-funnel attribution question, and it’s the one search-era attribution couldn’t answer because the engine’s answer at the awareness layer was never legible enough to score against candidate corpora.
The inversion: stop asking the engine, score the engine’s answer
The conceptual move that makes corpus similarity work is the inversion of how attribution was historically attempted. Search-era attribution tried to ask the system where the credit belonged: multi-touch models asked the analytics pipeline, view-through tracking asked the cookie, marketing mix modelling asked the regression. None of them got a good answer because the system being asked didn’t know.
The AI-era inversion is to stop asking the engine and start scoring the engine’s output. The engine produced an answer to the topical query. The answer is text. The candidate corpora that could have taught the engine are also text, and they’re scrapeable: your content corpus, each tracked competitor’s content corpus, the broader topical corpus the engine might have drawn from. Score the engine’s answer for textual similarity against each candidate corpus, and the corpus that scores highest is the corpus the engine most likely learned from.
The methodology stops asking the unanswerable question (why did the engine recommend this?) and starts asking the answerable one (whose content does the engine’s answer look like?). The latter is solvable. The former never will be.
Topical answer adoption: the measurement
Topical answer adoption is the measurement that operates the inversion. For each twig query in the tree, the methodology does five things in sequence.
It submits the twig query to each tracked assistive engine at temperature zero. It captures the engine’s full text answer. It scores the engine’s answer for textual similarity against the brand’s content corpus, using a strict-rubric AI evaluator at temperature zero. It repeats the scoring against each tracked competitor’s content corpus. It records the scores period-over-period.
The output is a similarity score per twig query by candidate corpus cell. A brand running ten twigs against five candidate corpora produces fifty scores per period, and the period-over-period delta on each score is the trend the methodology reads.
The measurement is called topical answer adoption because the score measures how thoroughly the engine has adopted the topical answer from a specific corpus. High adoption means the engine reasoned from that corpus heavily. Low adoption means the engine reasoned from elsewhere. The score is the operational read.
Corpus similarity: the scoring mechanism
The scoring mechanism is corpus similarity, computed by a strict-rubric AI evaluator at temperature zero. Three structural choices in the implementation matter, and they need naming explicitly because they determine whether the score is defensible across periods.
Strict rubric. The evaluator runs against a fixed scoring rubric that names the dimensions being compared (lexical overlap, conceptual alignment, structural similarity, position emphasis, fact alignment), the weight applied to each dimension, and the score ranges that map to each output value. The rubric is locked at the start of a measurement programme and not modified mid-programme, because a delta computed against a moving rubric is noise pretending to be signal.
AI evaluator at temperature zero. The scorer is an AI model running at temperature zero, which is the closest the current generation of large language models comes to deterministic output. Deterministic output is what makes period-over-period comparison defensible. The same twig query, run through the same engine, producing the same answer, scored by the same evaluator at temperature zero against the same corpus, should produce the same similarity score this quarter as last quarter. Drift from that property indicates a real change in the underlying behaviour rather than a measurement artefact.
Model choice. In Kalicube’s testing, GPT-4 at temperature zero is the most literal-rules-following of the frontier models for strict-rubric scoring work. Claude at temperature zero remains interpretive in ways that produce variation across identical inputs. Gemini at temperature zero leans consensus-seeking in ways that smooth out the very signals the methodology is trying to surface. Kalicube runs GPT-4 for scoring. Other practitioners may settle on different models as the frontier evolves, and the model choice is publicly nameable so the field can compare notes.
The combination of strict rubric, temperature zero, and stable model choice across periods is what makes the score defensible. Not precise (the AI era doesn’t support precision), but defensible: replicable across analysts running the same rubric, comparable across periods running the same evaluator, and stable enough that the trend across quarters is the signal the methodology reads.
Reading the score: high similarity, low similarity, the patterns
The score itself is a number per twig-by-corpus cell. The interpretation matters more than the number.
A high similarity score between the engine’s answer and your corpus means the engine has adopted your content as the topical answer for the twig in question. The engine is reasoning from your corpus when it answers buyers in your cohort at the awareness layer. You earned the position, the engine learned from you, and the cohort that walks the path from this twig downward toward the trunk is walking through a forest where your content shaped the awareness.
A low similarity score between the engine’s answer and your corpus means the engine has adopted someone else’s content as the topical answer. The someone else is usually identifiable: the corpus that scored highest against the engine’s answer, across the candidate corpora you’re tracking, is the corpus the engine most likely learned from. If your competitor’s corpus scored 0.78 and yours scored 0.31, the engine is reasoning from your competitor’s content when it answers the twig.
The pattern across twigs is the diagnostic. Twigs where you score highest are the topical areas where you’ve earned the awareness-layer position. Twigs where a specific competitor consistently scores highest are the topical areas that competitor has owned. Twigs where multiple candidate corpora score moderately and nobody dominates are the topical areas where the engine is drawing from a wider pool, often the generic web rather than any specific brand corpus.
Each pattern points to a different operational move. Twigs you own get reinforced. Twigs a competitor owns get attacked through targeted content investment. Twigs where the pool is shared get treated as opportunities for category leadership, where consistent investment over multiple quarters can shift the engine’s adoption from generic to brand-specific.
A worked example: Uniqlo’s red shirt twig
The Uniqlo case from the SEL Article 14 walkthrough makes the methodology concrete. The tree being tracked is “XL men buying a red shirt.” The trunk is a branded BOFU query like “Uniqlo XL men’s red shirt.” The twigs are awareness queries the same cohort asks before the brand is named, including the legible one I used as the example: “can men wear red shirts to work.”
The methodology runs the twig query through each tracked engine. The engine produces a text answer about whether men can wear red shirts to work, what contexts make it more or less appropriate, what shades work for which environments. The methodology captures the full text of the answer and scores it for corpus similarity against Uniqlo’s content corpus, Zara’s corpus, H&M’s corpus, ASOS’s corpus, and any other tracked competitor.
Three outcomes are interesting. If Uniqlo scores highest, the engine has adopted Uniqlo’s content as the topical answer for the red-shirt-at-work question. Uniqlo owns the awareness layer for that twig. If a competitor scores highest, the engine has adopted that competitor’s content. Uniqlo doesn’t own the awareness layer, and the buyer arriving at the trunk from this twig walked through a topical conversation shaped by a competitor. If nobody scores particularly high, the engine is reasoning from a broader topical corpus (men’s style blogs, fashion magazines, generic advice content), which is its own diagnostic: the category as a whole hasn’t earned awareness-layer ownership and the field is open.
The Uniqlo case is the worked illustration. The same structure applies to any cohort-with-intent intersection. Replace the cohort, the intent, and the twig queries with your own, and the methodology runs identically.
Defending the methodology against the critical replies
This is the most novel piece of the Kalicube Macro Method and the one most likely to draw critical replies, so the objections are named explicitly here and answered in order.
“The engine’s answer is non-deterministic, so the similarity score is meaningless.” The objection is real at micro scale and dissolves at the scale the methodology runs. The engine produces answers that vary across users and sessions due to personalisation and stochastic generation, but the methodology runs at volume against API endpoints and browser-emulation runs that carry no user signature, and the central tendency across many runs is stable. The similarity score is computed against the central tendency, not against a single run, and the period-over-period delta is the signal the methodology reads. Single-run variance averages out at scale.
“Corpus similarity doesn’t prove the engine learned from that corpus, it proves the answer happens to look like it.” Correct, and the methodology doesn’t claim causal proof. It claims a defensible diagnostic: when the engine’s answer scores highest against a specific corpus across a set of related queries over multiple periods, the most parsimonious explanation is that the engine has been trained on or is grounding from that corpus. The diagnostic doesn’t need causal proof to be operationally useful. It needs to be replicable, comparable across periods, and stable enough that the trend is the signal.
“What if the engine learned from a corpus you’re not tracking?” Then your highest tracked corpus will score lower than it would if the actual source were in the set, the score gap between candidates will compress, and the pattern across queries will point you toward investigating which untracked corpus is in play. The methodology surfaces the gap rather than guessing what’s in it.
“What if multiple corpora cover similar ground and the score is ambiguous?” Then the diagnostic for that twig is ambiguous, and the methodology says so. The score is the data, the brand’s interpretation is the analysis, and the brand running this methodology accepts that some twigs produce clear reads and some don’t. Act on the clear ones, monitor the ambiguous ones.
“Why would I trust an AI evaluator to score AI output? Won’t both share blind spots?” Possibly, but the alternative is human evaluators scoring AI output at the scale this methodology requires, which is not operationally feasible. The methodology accepts the AI-evaluating-AI constraint and mitigates it through strict-rubric discipline and temperature zero, on the principle that a known, replicable methodology is more useful than a perfect methodology that doesn’t run at scale.
Position within the Kalicube Macro Method
Topical answer adoption is one of three measurements that run at the twig position of any tree in the orchard. The other two are simpler and worth naming for completeness.
Brand appearance at the twig layer. A binary read: does your brand surface at all in the engine’s topical answer? Most twigs at most engines produce no brand surfacing because the engine is reasoning about the topic rather than recommending brands, but the twigs where your brand does surface are the ones where the engine has decided you’re topically authoritative enough to name even when the buyer didn’t ask about brands. Track which twigs surface your brand period-over-period and the pattern is its own read on awareness-layer authority.
Competitor creep at the twig layer. When your brand doesn’t surface, which competitors do? The brands that surface at the twig layer when yours doesn’t are the brands the engine treats as topically authoritative for that category. If the same three competitors keep surfacing across your twigs and your brand keeps missing, the topical authority deficit is the structural problem that needs addressing before the trunk and the branches can move.
Topical answer adoption via corpus similarity. The measurement this piece treats. The deepest read of the three, and the one that names whose corpus the engine learned from rather than just whose brand the engine surfaced.
Together, the three measurements give a complete picture of the twig layer. Appearance tells you whether the engine knows you exist in this topical category. Creep tells you who the engine knows instead. Topical answer adoption tells you whose content the engine actually reasoned from when constructing the answer.
Why publish the methodology openly
The methodology publishes here as a research instrument the field can refine, and the openness is deliberate.
Kalicube operates commercially at the trunk position of the orchard, on brand narrative integrity at the buying moment, where Kalicube Proâ„¢ tracks one representative BOFU query per tree as the period-over-period read on whether the engine is recommending the brand accurately when buyers name it. That’s the productised service. The twig-layer methodology, including topical answer adoption via corpus similarity, is the wider methodology that the field strengthens by sharing. Other practitioners will build instruments at the twig and branch layers that operationalise the same macro discipline through different implementations, and the field gets stronger as more instruments get built, named, dated, and compared in public.
For me, the discipline of top-of-funnel attribution in the AI era doesn’t belong to any one consultancy. The instruments that operationalise it can. The methodology is dated to this article, the implementation choices are nameable (strict rubric, GPT-4 at temperature zero, scrapeable corpora as the comparison set), and the operational pattern is reproducible. Practitioners are encouraged to run it, refine it, publish their variations, and challenge the implementation choices. That’s how the methodology becomes a discipline.
First Publication Notice
The methodology, the measurement (topical answer adoption), the scoring mechanism (corpus similarity), the operational implementation (strict-rubric AI evaluator at temperature zero, GPT-4 for scoring), and the refinement to the Funnel Query Pathway metaphor (orchard, trees with trunks, branches, and twigs, with the brand as the ground of the orchard) are published here for the first time on 25 May 2026.
These concepts, the framework integration, and the operational walkthrough are original contributions by Jason Barnard (Kalicube).
For the structural treatment of the Funnel Query Pathway itself, see the Strategy Sandbox piece dated 12 May 2026.