Back to Discovery
Lesson 5

Robots.txt

Why this matters

One misplaced character in robots.txt can hide your entire site from Google. This file is small, powerful, and easy to break.

What it does

Tells well-behaved crawlers which URLs they may or may not crawl. It does NOT remove pages from the index — only blocks crawling.

Safe defaults

User-agent: * Allow: / Sitemap: https://yourdomain.com/sitemap.xml

What to block

  • /admin/, /dashboard/ — private areas
  • Search-results pages with infinite parameter combos
  • Staging/preview environments (use noindex meta on the actual pages, too)

What NEVER to block

  • CSS and JS files (Google needs them to render)
  • Your sitemap
  • Pages you actually want to rank

Disaster pattern

Disallow: / under User-agent: * blocks the entire site. Audit after every deploy.

All lessons
5 of 10