Back to Discovery
Lesson 5
Robots.txt
Why this matters
One misplaced character in robots.txt can hide your entire site from Google. This file is small, powerful, and easy to break.
What it does
Tells well-behaved crawlers which URLs they may or may not crawl. It does NOT remove pages from the index — only blocks crawling.
Safe defaults
User-agent: * Allow: / Sitemap: https://yourdomain.com/sitemap.xml
What to block
- •/admin/, /dashboard/ — private areas
- •Search-results pages with infinite parameter combos
- •Staging/preview environments (use noindex meta on the actual pages, too)
What NEVER to block
- •CSS and JS files (Google needs them to render)
- •Your sitemap
- •Pages you actually want to rank
Disaster pattern
Disallow: / under User-agent: * blocks the entire site. Audit after every deploy.
All lessons
5 of 10
