AEO, GEO & Fundamentals

RAG explained: how retrieval decides which sources AI cites

Most AI citations come through retrieval, not memory. Understand the retrieve → rank → synthesize → attribute pipeline and you understand exactly what to optimise.

Updated May 20268 min read
The short answer

RAG (retrieval-augmented generation) is the technique that lets an AI assistant answer using live, external documents instead of only its frozen training memory. When you ask ChatGPT, Gemini or Grok a question that benefits from current or specific information, the system runs a four-stage pipeline: retrieve a set of candidate passages from the web or an index, rank them by relevance and trust, synthesize an answer from the top passages, and attribute that answer to the sources it leaned on. The decisive insight for visibility is that retrieval operates on passages, not whole pages — a single well-formed paragraph that cleanly answers the question can be retrieved and cited even if the rest of the page is unremarkable. So “getting cited” is mostly the art of making individual passages retrievable: self-contained, on-topic, clearly written, and corroborated elsewhere on the web.

What is RAG, in plain English?

RAG stands for retrieval-augmented generation. Strip the jargon and it means: before the model writes, it goes and fetches relevant documents, then writes its answer grounded in those documents. The “augmented” part is the point — the model’s own training is augmented with fresh, external material at the moment you ask. That is how an assistant can cite a page published after its training cutoff, or pull a niche fact it never memorised.

For anyone who cares about being cited, RAG is the single most useful mental model, because it tells you precisely where the decision happens: in retrieval and ranking, before a word of the answer is written.

How does the RAG pipeline choose sources?

Think of it as four stages, each a filter your content has to pass:

The RAG pipeline and what each stage rewards
StageWhat happensWhat it rewards
1. RetrieveThe system pulls a candidate set of passages from the web or an index that semantically match the query.Passages that are on-topic and self-contained — chunks that clearly relate to the question on their own.
2. RankCandidates are scored and re-ordered by relevance, quality and trust signals.Clarity, specificity, and corroboration — sources that other reputable pages agree with.
3. SynthesizeThe model writes an answer drawing on the top-ranked passages.Passages that are easy to lift and quote without rewriting or heavy interpretation.
4. AttributeThe answer is linked back to the sources it relied on.Sources that contributed a distinct, identifiable fact or framing to the answer.

Each stage is a gate. You cannot be attributed if you were not synthesized; you are not synthesized if you did not rank; you do not rank if you were not retrieved. So the work starts at the very first gate: being retrievable at all.

What makes a passage retrievable?

Retrieval matches the meaning of a query against the meaning of your passages, so the most retrievable content shares a few traits:

  • Self-contained. The passage answers the question without needing the three paragraphs above it for context. Pulled out alone, it still makes sense.
  • Topically tight. One section, one idea. Passages that wander dilute the semantic match and rank worse.
  • Question-shaped. Headings phrased as the literal question, answered immediately beneath, line up with how queries are matched. The full playbook is in semantic completeness & answer blocks.
  • Corroborated. The ranking stage leans on trust, and trust is largely a function of whether other reputable sources say the same thing — see do backlinks affect AI recommendations?

Where does training memory fit in?

RAG does not replace the model’s training — it supplements it. For broad, stable facts the model may answer from memory and cite nothing. For fresh, specific or contested questions it retrieves. This split is why citation behaviour shifts over time and why the same question can produce different sources on different days; we unpack that in how often do AI models update what they cite? The practical takeaway: the queries where you can win citations are disproportionately the retrieval-heavy ones, so target those.

How do I see which passages of mine get retrieved?

You cannot watch the retrieval step directly, but you can observe its result: the set of queries on which the models actually name your domain. That is a reverse AI search — read the query–domain index backwards from your domain. Run the free Domain Check to see the questions where your passages already clear all four gates, then look for the adjacent questions where they do not. Those gaps are your retrievability worklist.

Frequently asked questions

Is every AI answer powered by RAG?

No. Some answers come purely from the model’s training memory, especially for well-known facts. RAG kicks in when the system decides fresh or specific external information would help — which is exactly when citations appear. Citation-bearing answers are the RAG ones.

Does RAG retrieve whole pages or passages?

Passages. Documents are typically split into chunks, and retrieval scores those chunks individually. This is why a single strong paragraph can be cited even when the surrounding page is mediocre — and why structure matters so much.

How do I make my content more retrievable?

Write self-contained passages that answer one question each, lead with the answer, keep the topic of each section tight, and earn corroboration so the ranking stage trusts you. Retrievability is mostly structure plus trust.

Is RAG the same as a web search?

Related but not identical. A web search returns links for a human to read; RAG retrieves passages for a model to read, then writes an answer from them. The retrieval step often uses search-like signals, but the output is synthesis, not a result list.