Why Two AI Tools Give Different Visibility Scores for My Brand
You ran two checkers and got two different numbers. Nobody's broken — they measured different things. Here's why scores diverge, and why the query list is the part you should actually trust.
Two AI visibility tools give you different scores because they aren’t measuring the same thing. Each tool asks its own set of prompts, runs them against its own choice of models (some only ChatGPT, others a different mix), samples a different number of times, and rolls the result into a score with its own private formula. Change any one of those inputs — the prompt list, the models, how many runs, how a “mention” is counted — and the number moves, even though your brand hasn’t. That is why a single 0–100 score is a weak, non-comparable signal across tools. What is comparable and actionable is the underlying list: the specific queries you’re cited on, which models named you, and who appears beside you. Trust the query list, not the headline number — and when you compare tools, compare what they actually return, not whose score is higher.
The short version: they measured different things
It feels alarming to get two numbers, but it’s the expected outcome. A visibility score isn’t a fact about your brand the way your phone number is. It’s the output of a recipe, and every tool uses a different recipe. Below are the four ingredients that diverge.
Reason 1: Different prompt sets
Every tool decides which questions to ask the models on your behalf. One might test ten broad “best [category]” prompts; another might test forty narrower, intent-specific ones. Your brand can be strong on the narrow questions and weak on the broad ones — so the tool that happens to ask more of your strong questions reports a higher score. Same brand, different exam.
Reason 2: Different models
The assistants disagree with each other constantly. A brand ChatGPT names confidently might be absent from Gemini, or vice versa. So a tool that only checks ChatGPT will score you differently from one that blends ChatGPT, Gemini and Grok. If you don’t know which models a tool covers, its score is hard to interpret. We unpack the model differences in ChatGPT vs Gemini vs Grok: how each picks businesses.
Reason 3: Different sampling
AI answers aren’t deterministic — ask the same question twice and you can get different sources. Tools handle this by sampling: running each prompt some number of times and aggregating. A tool that runs each prompt once is noisier than one that runs it many times and averages. Different sampling depth, different number — purely from variance, not from any real change in your visibility.
Reason 4: Different definitions and formulas
Finally, each tool defines the terms its own way. Does a plain brand mention count the same as a linked citation? Is position weighted? Is a score the percentage of prompts you appeared in, or something more elaborate? These private formulas are usually undisclosed, so two scores built on different definitions simply can’t be compared like-for-like.
What to trust instead: the query list
Because the score is recipe-dependent, the robust signal is the layer underneath it — the concrete, checkable facts a number compresses away:
- The exact queries you’re cited on, so you can judge intent and value yourself.
- Which models named you, so “cited by ChatGPT” isn’t conflated with “cited by AI.”
- The competitors appearing beside you, which tells you who the models treat as your substitutes.
Two query lists can be reconciled by reading them. Two scores can’t. That’s the whole case for preferring the list — and it’s why our free Domain Check hands you the real queries rather than a lone figure. The fuller argument is in free AI visibility check that returns queries, not just a score.
So how should I actually use a score?
Use it for one job only: tracking change within a single tool over time, holding the method constant. “My score in tool X rose this month” is meaningful. “Tool X scores me higher than tool Y” is not. When you’re choosing between tools, compare what they return, not whose number is bigger — our free AI visibility checkers comparison lays out exactly that. Back to the AI Visibility for Small Business pillar.
Frequently asked questions
Which tool's score is the 'correct' one?
None of them, in an absolute sense. A score is a summary of that tool’s own prompt set and method. There is no universal denominator, so scores aren’t comparable across tools. Use a score only to track change within one tool over time.
If scores aren't comparable, what should I compare?
The query list and model coverage. “Cited on these 12 buyer questions, by these models, alongside these competitors” is concrete and verifiable. That’s the output worth comparing between tools.
Does the same tool give the same number every time?
Not necessarily. AI answers vary between runs, so even one tool’s score can move from sampling alone. That’s another reason to read the underlying queries rather than chase a single figure.