Crawl Budget: Too Many Pages Hurting Rankings

These figures come from a real audit. One website. Eighty-three pages showing in Google's index. On closer inspection, more than half, fifty-three, turned out to be tag pages, category archives, author pages, paginated archives, URL parameter duplicates, and thin old posts. Accumulated slowly over years of running WordPress.

Google scanned all eighty-three. Attention split thin across too many pages at once. Quality checks landed on spots adding zero value. Thirty useful pages lost strength, watered down by fifty-three extras doing nothing.

Fixing this ranks among the most overlooked technical SEO improvements. Most cases get resolved quickly once you know where to look.

What crawl budget actually means

Every website gets a limited amount of attention from Google. Time spent crawling isn't endless. A cap exists on page visits per stretch of days, and that cap shifts depending on how strong your domain looks. The clearer the signal that your site matters, the more room you get.

When crawlers keep hitting shallow, low-value pages, they start doubting everything else. Trust erodes slowly. Strong material gets buried under bad impressions. What should surface ends up hidden.

The core problem

A weak page caught by Google does double damage. It burns through crawl resources. Then it quietly drags down how solid your entire domain looks to search engines.

The typical breakdown of a bloated WordPress site

On that 83-page site: 18 tag pages, 12 category archive pages, 15 thin or outdated posts, 8 author archive pages, and 30 genuinely useful core pages. The 30 useful pages were competing for crawl attention with 53 that should never have been indexed.

Tag and category archives, should be noindexed

Every time you assign a tag or category, WordPress creates another page for it. Fifty tags mean fifty extra pages filled with content Google already knows from your articles. These archives repeat what exists elsewhere. Block them from search results.

Author archive pages, should be noindexed

One writer means your author page duplicates the blog list exactly. Multiple writers produce author pages that often lack any real substance. These appear automatically in WordPress. Turning them off inside RankMath or Yoast fixes it instantly.

Paginated archives, remove or noindex

Page 2, page 3, page 4, same content, new URLs. Each one repeats what came before. Indexing all of them is unnecessary. Block crawling after the first page. Hiding them from search works best.

Old thin content, needs review

Posts sitting around since 2015 with little substance weigh your site down. They confuse what your pages stand for. Either rebuild them with real value, redirect them to something relevant, or block search engines from using them while you phase them out.

Parameter URLs, should be excluded

Pages showing up with extras like ?sort=price or ?ref=newsletter look different but aren't. Google might treat them as separate pages, creating duplication. Fix this through canonical tags and tell Search Console to ignore those added parameters.

Internal search result pages, needs review

Search result pages sometimes appear in Google by mistake. Not useful at all when they just generate changing results every time. Look inside Search Console under Coverage for paths like /search?q= or /results/. Without consistent content, Google should never be indexing these.

How to find your own problem pages

Step 1. Google Search Console coverage report

Go into Search Console and pull up the Pages report. Check the "Indexed" section carefully, looking for page types that don't belong: tags, categories, author pages, paginated archives, parameter URLs. This shows exactly what Google has decided to keep in its index.

Step 2. Site search

Type site:yourdomain.com.au into Google. See what shows up. Tag pages, author profiles, old posts you'd forgotten. That list tells you what Google thinks deserves space in its index from your site.

Step 3. Crawl your site

A tool like Screaming Frog, free for up to 500 URLs, moves through your pages like a scanner, listing every address it finds. You'll spot odd query strings, paths from past versions of the site, and forgotten content still live by accident. Search Console gives one view; a crawl tool turns up what slips under the radar.

How to fix it

Noindex tags

Sometimes pages just need to disappear from search results without being deleted. A noindex tag quietly asks Google to drop the page from its index while keeping it live on your site. It fades out of search results naturally over days or weeks. RankMath and Yoast both make this a single click on any post or page.

Robots.txt

Pages blocked in robots.txt won't be visited by Google going forward. But blocking them there leaves already-indexed pages untouched. For parameter URLs, target groups of URLs rather than single addresses. If a page was already indexed, add a noindex tag alongside the robots.txt disallow, that stops future crawling while removing existing entries.

Redirects instead of deletions

Delete pages only when they truly offer nothing. Instead of removing them outright, point the old URL toward a closely related page. Leaving dead ends creates 404s, which wastes crawl budget unnecessarily. Always redirect old paths to live ones.

What happened on that site

Eight weeks after 53 weak pages were blocked from indexing and navigation paths were streamlined, core page positions climbed in search results. Google started visiting the 30 useful pages more frequently. Its overall assessment of the site improved measurably.

How this connects to AI visibility

The same patterns that determine how Google indexes your site affect how AI systems decide to cite it. When pages are packed with weak, repeated, or shallow content, AI systems learning about your topic notice those signals. Clean layout and clear topical focus help machines view your site as a reliable source worth referencing. Solid indexing shapes how engines see you, not just now but going forward.

Want us to audit your site?

Book an online session and we'll walk through your Search Console data, identify the pages dragging down your domain quality, and give you a clear action list.

Get in Touch →

About the author

Douglas Lord

Digital Authority & AI Visibility Strategist · Founder of Digital Dominator · Creator of PTODA

Doug Lord is a Digital Authority & AI Visibility Strategist and founder of Digital Dominator. He created the Periodic Table of Digital Authority™ (PTODA), an independent research framework for measuring digital authority, AI visibility and crawler accessibility, and is co-founder of OG01, where he serves as COO and CPO.

Why 53 of your 83website pages mightbe hurting your rankings.

What crawl budget actually means

The typical breakdown of a bloated WordPress site

Tag and category archives, should be noindexed

Author archive pages, should be noindexed

Paginated archives, remove or noindex

Old thin content, needs review

Parameter URLs, should be excluded

Internal search result pages, needs review

How to find your own problem pages

Step 1. Google Search Console coverage report

Step 2. Site search

Step 3. Crawl your site

How to fix it

Noindex tags

Robots.txt

Redirects instead of deletions

What happened on that site

How this connects to AI visibility

Why 53 of your 83
website pages might
be hurting your rankings.