Glossary

GPTBot / ClaudeBot / PerplexityBot

These are the named user-agents AI companies use to crawl the web — for training data, for live retrieval, or both. Knowing them lets you allow or block access in robots.txt.

Updated May 2026Definition
The short answer

GPTBot, ClaudeBot, and PerplexityBot are named crawler user-agents operated by OpenAI, Anthropic, and Perplexity respectively. They identify themselves in the HTTP user-agent string so site owners can recognize, allow, or block them via robots.txt. Some fetch pages to build or refine training data; others fetch pages live to ground answers at query time. Controlling them is how you decide whether an AI engine can read — and potentially cite — your content.

What do these bots do?

Each is an automated crawler from a major AI company. GPTBot is OpenAI’s crawler; ClaudeBot is Anthropic’s; PerplexityBot is Perplexity’s. They request pages, read the content, and feed it into either model training or live answer retrieval, depending on the bot and its purpose.

How are these different from Googlebot?

Googlebot crawls primarily for the traditional search index. These AI bots crawl for AI training or AI answer retrieval. They are distinct user-agents, so a rule that allows Googlebot does not automatically allow GPTBot, ClaudeBot, or PerplexityBot — you control each one separately.

Example

To let Perplexity read your docs but block OpenAI’s training crawler, you would add an Allow rule for PerplexityBot and a Disallow rule for GPTBot in robots.txt. The named user-agents are what make that granular control possible.

Frequently asked questions

Should I block these crawlers?
Usually not if you want AI visibility — blocking a retrieval crawler can stop an engine from citing you. Blocking training crawlers is a separate choice about how your content is used to train models.
How do I control them?
Add user-agent rules in your robots.txt for each named bot. Pair this with Google-Extended and an llms.txt file.