Knowledge base/Machine AccessAUTO

robots.txt does not block key pages

Your robots.txt must allow crawlers to reach your homepage and main content.

Why this matters

robots.txt is the first file a crawler fetches. It tells the crawler which URLs it is allowed to touch. A single wrong line can lock out every AI agent in existence.

The most common failure is a development-era Disallow: / that never got removed before launch. The second most common is an overbroad rule that blocks /api or /blog and accidentally catches the main content.

A good robots.txt for AI readiness

User-agent: *
Allow: /

# Optional: point agents at your sitemap
Sitemap: https://yourdomain.com/sitemap.xml

If you want to explicitly welcome AI crawlers, add named entries for each bot. See the GPTBot, ClaudeBot, and PerplexityBot checks for specifics.

Things to avoid

  • Disallow: / — this blocks everything.
  • Disallow: /* — same thing in a disguise.
  • Per-bot blocks left over from a previous content strategy.
  • Robots files that return 404 — some strict crawlers treat that as disallow-all.