Pillar E · Research

The State of AI Citations: What the Data Says About How AI Picks Its Sources

A standing research hub powered by MentionRadar's live query–domain index — model disagreement, citation volatility, content-type win rates, and category benchmarks across ChatGPT, Gemini and Grok.

Updated May 2026Research hub
The short answer

The State of AI Citations is MentionRadar’s ongoing research program into how AI assistants choose the sources they cite. It is built on a native query–domain index: a background system that continuously runs real buyer questions through ChatGPT, Gemini and Grok, records which domains each model cites or mentions, and stores those links so they can be measured over time. From that index we publish defensible findings most analyses cannot — how often the three models disagree on which sources to cite, how quickly an AI answer changes (citation volatility), which content formats get cited most, and category-level share-of-voice benchmarks. We label every estimate as an estimate and tie claims to the index or to the three models themselves, rather than to fabricated metrics. The goal: replace AI-search hype with numbers you can act on.

What is the State of AI Citations research?

Most commentary on AI search is anecdote dressed as data — a screenshot of one ChatGPT answer, a vendor’s single visibility number, a confident claim with no method behind it. This hub takes the opposite approach. Every finding here comes from the same place MentionRadar’s product does: a query–domain inverted index that records, query by query and model by model, which domains the major AI assistants cite or mention. Because the index is a measurement instrument rather than a one-off survey, it can answer questions a snapshot can’t — namely, what changes over time and where the three models diverge.

If you are new to the underlying mechanic, start with the Reverse AI Search pillar, which explains how an inverted index of AI answers is built and why it lets you look up a domain and read its AI citation footprint. This research hub is what happens when you point that same instrument at the whole index instead of a single domain.

Why measure AI citations at all?

Three of the most consequential questions in AI search are still answered with guesses:

  • Do the models agree? If you optimise for ChatGPT, are you also winning Gemini and Grok — or are they citing entirely different sources for the same question?
  • Is a win durable? If an AI cites you today, will it still cite you next week, or is the answer churning underneath you?
  • What actually earns the citation? Is it backlinks, brand mentions, review-platform presence, content format — or some mix the hype articles never quantify?

You can’t make a confident content decision without an honest answer to these. The research below is our attempt to give defensible, method-first answers — and to flag, out loud, where the data is directional rather than conclusive.

The four things this index can measure that a score can’t

1. Model disagreement

Because the index queries ChatGPT, Gemini and Grok independently for the same question, it can compare their cited sources directly. The overlap (and the gap) is a metric almost no one publishes because almost no one queries all three at scale. See how often ChatGPT, Gemini & Grok disagree on sources.

2. Citation volatility

Re-running the same query over time shows how stable an AI answer’s source set is. Some answers are rock-steady; others swap citations week to week. That’s the difference between a defensible position and a coin flip — covered in AI citation volatility.

3. Content-type win rates

The domains and URLs the models cite fall into recognisable formats — comparison/“best of” lists, documentation, original research, community threads, review platforms. Which formats earn citations most often is an empirical question, addressed in which content types get cited most by AI.

4. Category share of voice

Within a category, a handful of domains tend to absorb most citations. Measuring that concentration gives a realistic benchmark for “good” visibility — see category share-of-voice benchmarks.

Our research principles (so you can trust the numbers)

  • Method before number. Every finding states what was queried, how many models, and over what window. A number without a method is an opinion.
  • Estimates are labelled. Where a figure is directional — a sample, a snapshot, a trend rather than a census — we say so explicitly.
  • No fabricated proof. We don’t invent customer names, testimonials, or precise statistics we can’t stand behind. Claims are tied to the live index or to the three models’ observable behaviour.
  • Contrarian when the data warrants it. If a popular belief — say, that prompt volume is a precise ranking input — isn’t supported, we’ll say so plainly.

When this hub starts publishing hard figures, every one will be sourced and dated. Our methodology explains exactly how a number gets from the index (or an attributed third-party study) onto the page — what is sampled, over what window, and how estimates are labelled.

The full research cluster

Every study in this pillar, in reading order:

How to use this research

Read the findings as priorities, not promises. If the data says the three models disagree more than you assumed, optimise for all three rather than ChatGPT alone. If it says a category is highly concentrated, expect a longer climb and pick your queries carefully. The fastest way to make any of this concrete is to look at your own domain: the free Domain Check reads the same index backwards and returns the actual queries the three models already cite you on — your personal slice of the State of AI Citations.

Frequently asked questions

What is the State of AI Citations research hub?

It is MentionRadar’s standing research program into how ChatGPT, Gemini and Grok choose the sources they cite, built on a live query–domain index that re-runs real buyer questions and records which domains each model cites or mentions over time.

Does this hub publish hard statistics yet?

Not yet. Today the hub explains what the index can measure and frames each finding qualitatively. When we publish figures, each will be sourced and dated per our methodology. We do not invent percentages or charts to fill the gap.

Why query all three models instead of just ChatGPT?

Because the models frequently cite different sources for the same question. Measuring one model under-counts your real exposure — see how often ChatGPT, Gemini & Grok disagree.

How can I see my own domain in this data?

Run the free Domain Check. It reads the same index backwards and returns the real queries the three models already cite your domain on — your personal slice of the State of AI Citations.