Create robots.txt for SEO Without Blocking Important Pages

Use robots.txt for crawl guidance, not security. The safest workflow is to keep public pages and assets crawlable, block low-value paths, and verify the file from the domain root.

Workflow

  1. List the public pages and assets that search engines must be able to fetch.
  2. Add disallow rules only for crawl waste such as admin paths, internal search pages or duplicate parameter areas.
  3. Add the canonical XML sitemap URL with the same protocol and host used by the live site.
  4. Generate the file, place it at /robots.txt, then open it directly in the browser.
  5. Check that blocked paths are not pages you expect to rank or render in search results.

Checks before production

  • Do not use robots.txt to hide private data because the file is public.
  • Avoid blocking CSS, JavaScript and image folders needed for rendering.
  • Use Google Search Console robots.txt testing and crawl stats after deployment.
  • Pair robots.txt with an XML sitemap and canonical redirects.

FAQ

Can robots.txt remove an indexed URL?

No. robots.txt controls crawling, not indexing. Use noindex or remove the URL for deindexing.

Should staging sites block all crawlers?

Yes, but staging should also use authentication. robots.txt alone is not access control.

Related tools

Last updated: May 18, 2026