Use robots.txt for crawl guidance, not security. The safest workflow is to keep public pages and assets crawlable, block low-value paths, and verify the file from the domain root.
Workflow
- List the public pages and assets that search engines must be able to fetch.
- Add disallow rules only for crawl waste such as admin paths, internal search pages or duplicate parameter areas.
- Add the canonical XML sitemap URL with the same protocol and host used by the live site.
- Generate the file, place it at /robots.txt, then open it directly in the browser.
- Check that blocked paths are not pages you expect to rank or render in search results.
Checks before production
- Do not use robots.txt to hide private data because the file is public.
- Avoid blocking CSS, JavaScript and image folders needed for rendering.
- Use Google Search Console robots.txt testing and crawl stats after deployment.
- Pair robots.txt with an XML sitemap and canonical redirects.
FAQ
Can robots.txt remove an indexed URL?
No. robots.txt controls crawling, not indexing. Use noindex or remove the URL for deindexing.
Should staging sites block all crawlers?
Yes, but staging should also use authentication. robots.txt alone is not access control.
Related tools
SEO
robots.txt Generator
Generate a clean robots.txt file for crawlers, staging blocks, sitemap discovery, and simple SEO crawl control.
Open tool Sitemap Sitemap CheckerCheck pasted XML sitemaps for basic structure, URL counts, loc values, lastmod format and common SEO mistakes.
Open tool HTTP HTTP Headers CheckerAnalyze pasted HTTP response headers for cache, redirects, security headers and content-type issues without making server-side requests.
Open toolLast updated: May 18, 2026