Crawl Budget
Crawl budget is the number of URLs that Googlebot will fetch and process on your website within a given period. Google allocates a crawl budget based on your site’s authority, speed, and server capacity. For small sites (under 1,000 pages), crawl budget is almost never a problem. For large sites (10,000+ pages), crawl budget management is a significant technical SEO discipline.
Example
An ecommerce site with 50,000 product pages may find that Googlebot only crawls 8,000 pages per day. If there is a new product launch with 200 pages that need to rank quickly, you need to ensure Googlebot prioritises those pages — otherwise they sit unindexed for days or weeks.
A medium-sized WordPress blog with 800 posts and no crawl budget management issues: Googlebot crawls all 800 pages within days of publication.
What Affects Crawl Budget
Server speed. Googlebot respects your server’s capacity. If your site responds slowly (above 3–5 seconds per page), Googlebot crawls fewer pages per session to avoid overloading your server.
Site authority. Higher-authority sites (as measured by backlink count and quality) receive larger crawl budgets. Google invests more crawl resources in sites it deems more important.
URL duplication. Every duplicate URL wastes crawl budget. Common sources: URL parameters (filters, sort orders, pagination), session IDs appended to URLs, HTTP vs HTTPS versions of the same page, www vs non-www. A site with 50,000 real pages but 200,000 URL variants due to parameters is burning crawl budget on junk URLs.
Robots.txt and noindex directives. Correctly configured robots.txt blocks Googlebot from crawling URLs that don’t need to be indexed (admin pages, thank-you pages, internal search results). This frees crawl budget for the pages that do need to rank.
Why It Matters for SEO
If Googlebot doesn’t crawl a page, that page cannot rank. On small sites, this almost never happens — Googlebot crawls everything within days. On large sites with thousands of pages and active publishing schedules, crawl budget management determines which of your pages actually enter Google’s index.
The practical implications:
- New content may not rank quickly. If your crawl budget is consumed by junk URLs, new pages may wait days or weeks to be indexed.
- Pruning thin content improves crawl efficiency. Removing or consolidating low-quality pages frees crawl budget for your best pages.
- Faceted navigation is the biggest crawl budget risk on ecommerce. A site with 10,000 products and 15 filter dimensions can generate millions of URL variants from faceted navigation — most of which should be blocked in robots.txt.
Tools That Help Manage Crawl Budget
- Screaming Frog at £149/yr — the most accurate way to audit your crawl waste. Run a full crawl to find URL parameter variants, redirect chains, and duplicate content that is consuming budget.
- Semrush Pro at $139.95/mo — Site Audit flags crawl waste issues including duplicate pages, redirect chains, and blocked resources.
- Google Search Console (free) — Crawl Stats report shows how many URLs Googlebot crawls per day and response codes. Essential baseline monitoring.