Research · Flagship report

State of AI Citations 2026: What a Live Index Reveals About How AI Picks Sources

The flagship report from MentionRadar's research hub — the structural findings that hold across ChatGPT, Gemini and Grok, with every estimate labelled as one.

Updated May 202612 min read
The short answer

State of AI Citations 2026 is MentionRadar’s flagship report on how today’s AI assistants choose the sources they cite, based on a live query–domain index spanning ChatGPT, Gemini and Grok. The headline structural findings: the three models disagree more than they agree on which sources to cite for the same buyer question, so single-model optimisation under-counts your real exposure; AI answers are volatile — the cited source set for a query can shift week to week even when the underlying pages don’t; citations concentrate on a small set of domains per category, with a long tail rarely cited; and comparison, original-data and review-platform formats earn citations out of proportion to their share of the web. All figures here are directional estimates drawn from the index, not a census — we say so wherever it matters.

How was this report compiled?

Every finding below comes from MentionRadar’s query–domain index: a background system that runs real buyer questions through ChatGPT, Gemini and Grok, extracts the domains each model cites or mentions, and records those links so they can be compared across models and over time. That design matters. A one-off prompt screenshot can’t tell you whether an answer is stable or whether two models agree — but an index that re-queries and stores results can. Where we state a number, treat it as a directional estimate from a sample of the index unless noted otherwise; the value is in the shape of the finding, not a false precision.

For the mechanics of how an inverted index of AI answers is built, see the Reverse AI Search pillar. For how a figure becomes publishable — what gets sampled, over what window, and how every estimate or attributed third-party number is labelled — see our methodology.

Finding 1: The three models disagree more than they agree

The single most important thing the index shows is that ChatGPT, Gemini and Grok frequently cite different sources for the same question. Some of that is expected — different training data, different retrieval, different recency — but the practical consequence is large: a domain that owns a query in ChatGPT may be absent from Grok’s answer entirely.

If you measure visibility against one model, you are systematically mis-estimating your real exposure. The deeper breakdown — how we compute overlap and what it means for strategy — is in how often do ChatGPT, Gemini & Grok disagree on sources. The takeaway for this report: optimise for, and measure across, all three models.

Finding 2: AI answers are volatile

Re-running the same query over time reveals that the cited source set is not fixed. For many questions, the domains an AI names this week differ from last week’s — sometimes because a page changed, often because the model’s retrieval or weighting shifted underneath an otherwise unchanged web.

That has two implications. First, a single check is a snapshot, not a verdict — you need to monitor over time to know whether a win is real. Second, volatility is itself a signal: a steady citation is a stronger asset than a flickering one. We unpack measurement and what drives churn in AI citation volatility.

Finding 3: Citations concentrate on a few domains per category

Within any given category, a small set of domains tends to absorb the majority of citations, while a long tail of sites is cited rarely or never. This mirrors the familiar power-law shape of organic search, and it sets realistic expectations: in a concentrated category, being cited at all puts you ahead of most competitors, and displacing an incumbent is a multi-query campaign, not a single content fix.

Knowing your category’s concentration tells you how hard the climb is. We publish per-category reference ranges in category share-of-voice benchmarks.

Finding 4: Format matters — some content types punch above their weight

Looking at what gets cited, certain formats appear far more often than their share of the web would predict: head-to-head comparisons and “best” lists, pages carrying original data, structured documentation, and review platforms. The common thread is extractability — content that gives a model a clean, self-contained, attributable answer to lift. The full breakdown is in which content types get cited most by AI, and the specific pull of review sites is examined in the review-platform effect.

Finding 5: Prompt volume is directional, not gospel

A popular idea in AI-search circles is that “prompt volume” — how often a question is asked of AI — can be measured precisely the way keyword search volume is. The honest read is that prompt-volume figures are inferred and noisy: useful for prioritisation, dangerous as a precise input. We make the case in does prompt volume mean anything. The report’s position: use it to rank, never to forecast.

What this means for your 2026 strategy

  • Measure across three models, not one. Disagreement is the norm; a single-model number under-counts you.
  • Monitor over time. Volatility means one check is a snapshot. Track the queries that matter so you see churn early.
  • Pick queries by category reality. In a concentrated category, prioritise ruthlessly and expect a campaign, not a quick win.
  • Write for extraction. Lead with a self-contained answer, structure for scanning, and publish formats that earn citations.
  • Treat volume as a tiebreaker. Use prompt-volume estimates to order work, not to promise outcomes.

See your own slice of the data

This report is the aggregate view. Your own view is one query away: the free Domain Check reads the same index backwards and returns the real queries ChatGPT, Gemini and Grok already cite your domain on — across all three models, with no signup. Run yours, then a competitor’s, and you’ll see Findings 1 and 3 in your own category in seconds.

Frequently asked questions

Does this report contain hard statistics?

No. The 2026 report states the structural findings — disagreement, volatility, concentration, format effects — as directional patterns, not precise percentages. We do not publish a number until it is sourced and dated per our methodology.

What is the single most important finding?

That ChatGPT, Gemini and Grok disagree more than they agree on which sources to cite for the same question — so a single-model visibility number systematically under-counts your real exposure. See how often the models disagree.

Why does volatility matter for my strategy?

Because the cited source set for a query can shift between checks, a single check is a snapshot, not a verdict. You need to monitor over time to know whether a citation is durable — covered in AI citation volatility.

How do I apply this report to my own domain?

Run the free Domain Check, then run a competitor’s. You will see the disagreement and concentration findings play out in your own category within seconds, using the same index this report draws on.