These figures come from a real audit. One website. Eighty-three pages showing in Google's index. On closer inspection, more than half, fifty-three, turned out to be tag pages, category archives, author pages, paginated archives, URL parameter duplicates, and thin old posts. Accumulated slowly over years of running WordPress.
Google scanned all eighty-three. Attention split thin across too many pages at once. Quality checks landed on spots adding zero value. Thirty useful pages lost strength, watered down by fifty-three extras doing nothing.
Fixing this ranks among the most overlooked technical SEO improvements. Most cases get resolved quickly once you know where to look.
What crawl budget actually means
Every website gets a limited amount of attention from Google. Time spent crawling isn't endless. A cap exists on page visits per stretch of days, and that cap shifts depending on how strong your domain looks. The clearer the signal that your site matters, the more room you get.
When crawlers keep hitting shallow, low-value pages, they start doubting everything else. Trust erodes slowly. Strong material gets buried under bad impressions. What should surface ends up hidden.
A weak page caught by Google does double damage. It burns through crawl resources. Then it quietly drags down how solid your entire domain looks to search engines.
The typical breakdown of a bloated WordPress site
On that 83-page site: 18 tag pages, 12 category archive pages, 15 thin or outdated posts, 8 author archive pages, and 30 genuinely useful core pages. The 30 useful pages were competing for crawl attention with 53 that should never have been indexed.
Tag and category archives, should be noindexed
Every time you assign a tag or category, WordPress creates another page for it. Fifty tags mean fifty extra pages filled with content Google already knows from your articles. These archives repeat what exists elsewhere. Block them from search results.
Author archive pages, should be noindexed
One writer means your author page duplicates the blog list exactly. Multiple writers produce author pages that often lack any real substance. These appear automatically in WordPress. Turning them off inside RankMath or Yoast fixes it instantly.
Paginated archives, remove or noindex
Page 2, page 3, page 4, same content, new URLs. Each one repeats what came before. Indexing all of them is unnecessary. Block crawling after the first page. Hiding them from search works best.
Old thin content, needs review
Posts sitting around since 2015 with little substance weigh your site down. They confuse what your pages stand for. Either rebuild them with real value, redirect them to something relevant, or block search engines from using them while you phase them out.
Parameter URLs, should be excluded
Pages showing up with extras like ?sort=price or ?ref=newsletter look different but aren't. Google might treat them as separate pages, creating duplication. Fix this through canonical tags and tell Search Console to ignore those added parameters.
Internal search result pages, needs review
Search result pages sometimes appear in Google by mistake. Not useful at all when they just generate changing results every time. Look inside Search Console under Coverage for paths like /search?q= or /results/. Without consistent content, Google should never be indexing these.
How to find your own problem pages
Step 1. Google Search Console coverage report
Go into Search Console and pull up the Pages report. Check the "Indexed" section carefully, looking for page types that don't belong: tags, categories, author pages, paginated archives, parameter URLs. This shows exactly what Google has decided to keep in its index.
Step 2. Site search
Type site:yourdomain.com.au into Google. See what shows up. Tag pages, author profiles, old posts you'd forgotten. That list tells you what Google thinks deserves space in its index from your site.
Step 3. Crawl your site
A tool like Screaming Frog, free for up to 500 URLs, moves through your pages like a scanner, listing every address it finds. You'll spot odd query strings, paths from past versions of the site, and forgotten content still live by accident. Search Console gives one view; a crawl tool turns up what slips under the radar.
How to fix it
Noindex tags
Sometimes pages just need to disappear from search results without being deleted. A noindex tag quietly asks Google to drop the page from its index while keeping it live on your site. It fades out of search results naturally over days or weeks. RankMath and Yoast both make this a single click on any post or page.
Robots.txt
Pages blocked in robots.txt won't be visited by Google going forward. But blocking them there leaves already-indexed pages untouched. For parameter URLs, target groups of URLs rather than single addresses. If a page was already indexed, add a noindex tag alongside the robots.txt disallow, that stops future crawling while removing existing entries.
Redirects instead of deletions
Delete pages only when they truly offer nothing. Instead of removing them outright, point the old URL toward a closely related page. Leaving dead ends creates 404s, which wastes crawl budget unnecessarily. Always redirect old paths to live ones.
What happened on that site
Eight weeks after 53 weak pages were blocked from indexing and navigation paths were streamlined, core page positions climbed in search results. Google started visiting the 30 useful pages more frequently. Its overall assessment of the site improved measurably.
How this connects to AI visibility
The same patterns that determine how Google indexes your site affect how AI systems decide to cite it. When pages are packed with weak, repeated, or shallow content, AI systems learning about your topic notice those signals. Clean layout and clear topical focus help machines view your site as a reliable source worth referencing. Solid indexing shapes how engines see you, not just now but going forward.
Book an online session and we'll walk through your Search Console data, identify the pages dragging down your domain quality, and give you a clear action list.
20+ years in SEO and digital strategy. Founder of Digital Dominator, douglord.com, and private AI visibility diagnostic systems. Based in Byron Bay, working with clients worldwide.