GPTBot / ClaudeBot / PerplexityBot
These are the named user-agents AI companies use to crawl the web — for training data, for live retrieval, or both. Knowing them lets you allow or block access in robots.txt.
GPTBot, ClaudeBot, and PerplexityBot are named crawler user-agents operated by OpenAI, Anthropic, and Perplexity respectively. They identify themselves in the HTTP user-agent string so site owners can recognize, allow, or block them via robots.txt. Some fetch pages to build or refine training data; others fetch pages live to ground answers at query time. Controlling them is how you decide whether an AI engine can read — and potentially cite — your content.
What do these bots do?
Each is an automated crawler from a major AI company. GPTBot is OpenAI’s crawler; ClaudeBot is Anthropic’s; PerplexityBot is Perplexity’s. They request pages, read the content, and feed it into either model training or live answer retrieval, depending on the bot and its purpose.
How are these different from Googlebot?
Googlebot crawls primarily for the traditional search index. These AI bots crawl for AI training or AI answer retrieval. They are distinct user-agents, so a rule that allows Googlebot does not automatically allow GPTBot, ClaudeBot, or PerplexityBot — you control each one separately.
Example
To let Perplexity read your docs but block OpenAI’s training crawler, you would add an Allow rule for PerplexityBot and a Disallow rule for GPTBot in robots.txt. The named user-agents are what make that granular control possible.