robots.txt does not block key pages
Your robots.txt must allow crawlers to reach your homepage and main content.
Why this matters
robots.txt is the first file a crawler fetches. It tells the crawler which URLs it is allowed to touch. A single wrong line can lock out every AI agent in existence.
The most common failure is a development-era Disallow: / that never got removed before launch. The second most common is an overbroad rule that blocks /api or /blog and accidentally catches the main content.
A good robots.txt for AI readiness
User-agent: * Allow: / # Optional: point agents at your sitemap Sitemap: https://yourdomain.com/sitemap.xml
If you want to explicitly welcome AI crawlers, add named entries for each bot. See the GPTBot, ClaudeBot, and PerplexityBot checks for specifics.
Things to avoid
- — Disallow: / — this blocks everything.
- — Disallow: /* — same thing in a disguise.
- — Per-bot blocks left over from a previous content strategy.
- — Robots files that return 404 — some strict crawlers treat that as disallow-all.