Glossary

Google-Extended

Google-Extended is a separate robots.txt token that controls whether your content can be used to train Google's Gemini and Vertex AI — independent of normal Googlebot indexing.

Updated May 2026Definition
The short answer

Google-Extended is a standalone robots.txt user-agent token that lets you control whether Google may use your site’s content to train and ground its generative AI products, such as Gemini and Vertex AI. It is deliberately separate from Googlebot: you can stay fully indexed in Google Search while opting out of AI training. It is not a crawler itself but a control signal that Google honors when deciding how your content feeds its AI models.

What does Google-Extended do?

Google-Extended gives publishers a switch for AI training use. By adding a rule for it in robots.txt, you tell Google whether your content can be used to develop and improve its generative AI products, separate from the decision to be indexed and ranked in Search.

How is it different from Googlebot?

Googlebot is the crawler that powers Search indexing. Google-Extended is a policy token for AI training. Allowing Googlebot keeps you in Search; disallowing Google-Extended opts you out of AI training without touching your rankings. They are controlled with separate rules. Compare this with the named AI crawlers in GPTBot / ClaudeBot / PerplexityBot.

Example

A publisher who wants Search traffic but not AI training would keep Googlebot allowed and add User-agent: Google-Extended followed by Disallow: / in robots.txt. Their pages still rank in Google, but are not used to train Gemini.

Frequently asked questions

Does blocking Google-Extended remove me from Search?
No. Google-Extended is independent of Googlebot. Disallowing it affects generative AI training use, not your normal Search indexing or rankings.
Is it a crawler?
No — it is a robots.txt token, a control signal. There is no separate “Google-Extended” bot fetching pages; it governs how Google may use content it already accesses.