Majestic

  • Site Explorer
    • Majestic
    • Summary
    • Ref Domains
    • Backlinks
    • * New
    • * Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Author ExplorerBeta
    • Summary
    • Similar Profiles
    • Profile Backlinks
    • Attributions
  • Compare
    • Summary
    • Backlink History
    • Flow Metric History
    • Topics
    • Clique Hunter
  • Link Tools
    • My Majestic
    • Recent Activity
    • Reports
    • Campaigns
    • Verified Domains
    • OpenApps
    • API Keys
    • Keywords
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
    • Link Tools
    • Bulk Backlinks
    • Neighbourhood Checker
    • Submit URLs
    • Experimental
    • Index Merger
    • Link Profile Fight
    • Mutual Links
    • Solo Links
    • PDF Report
    • Typo Domain
  • Free SEO Tools
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
    • Post Popularity
    • Social Explorer
  • Support
    • Blog External Link
    • Support
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • Style Guide
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • About Backlinks and SEO
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Sign Up for FREE
  • Plans & Pricing
  • Login
  • Language flag icon
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文
  • Get started
  • Login
  • Plans & Pricing
  • Sign Up for FREE
    • Summary
    • Ref Domains
    • Map
    • Backlinks
    • New
    • Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Summary
      Pro
    • Backlink History
      Pro
    • Flow Metric History
      Pro
    • Topics
      Pro
    • Clique Hunter
      Pro
  • Bulk Backlinks
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
      Pro
  • Neighbourhood Checker
    Pro
    • Index Merger
      Pro
    • Link Profile Fight
      Pro
    • Mutual Links
      Pro
    • Solo Links
      Pro
    • PDF Report
      Pro
    • Typo Domain
      Pro
  • Submit URLs
    • Summary
      Pro
    • Similar Profiles
      Pro
    • Profile Backlinks
      Pro
    • Attributions
      Pro
  • Custom Reports
    Pro
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
    • Post Popularity
    • Social Explorer
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • Site Updates
    • The Company
    • Style Guide
    • Terms & Conditions
    • Privacy Policy
    • GDPR
    • Contact Us
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Blog External Link
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文

Manage your crawl budget effectively

Bastian Grimm

Bastian Grimm from Peak Ace AG has noticed a significant but overlooked issue arising from the proliferation of AI-generated content: Google struggling to find the budget to crawl and index your site.

@basgr    
Bastian Grimm 2025 podcast cover with logo
More SEO in 2025 YouTube Podcast Playlist Link Spotify Podcast Playlist Link Audible Podcast Playlist Link Apple Podcast Playlist Link

Manage your crawl budget effectively

Bastian says: “I’m going to talk about how to tackle the growing challenge of managing crawl efficiency and crawl budget in an era where search engines like Google are overwhelmed by the sheer volume of content that’s currently being produced.”

How do you manage Google’s capacity to crawl what you have to offer?

“Right now, we have this content explosion. Everyone and anyone is using AI-supported tools that make content creation faster and cheaper than ever – and it’s creating real challenges for Google. They’re struggling to crawl and index it all. Despite all the fancy algorithms they have, they just can’t keep pace with this increasing volume. That is the biggest problem.

Managing crawl budget is really crucial, and it involves best practices that we have talked about for a long time. It’s not rocket science but it’s even more important this year. Years ago, you would only have done this extensively with sites that were very sizable. Right now, even a mid-size site with 5/6 figures of URLs will see benefits.

To go about it, there are a handful of things that are straightforward, but it’s worth quickly echoing them. A good starting point is the consolidation of low-value pages. Merge or eliminate the pages that are either thin, duplicate, or don’t bring any value. Deleting URLs/inventory will ensure that other stuff is crawled more frequently.

The same goes for internal linking. That’s a topic that SEOs talk about a lot, and rightfully so, but it does make a big difference if you link the right pages with each other and build a certain hierarchy. It helps Google to crawl and index the most important pages on a domain or site.

From a tooling perspective, you can use robots.txt, meta tags, and the X-Robots-Tag which prevent crawling and/or indexing. Depending on the use case, this is the weaponry that you have in your arsenal.

Then, there are XML sitemaps. Again, this is not brand new, but a lot of sites struggle with maintaining or creating valid error-free site maps in the first place. It’s a strong signal for Google – also from a canonicalization standpoint – of which URLs are more important than others.

There’s also a protocol called IndexNow. Even though Google does not officially support it, Bing does, Yandex does, and others do as well. That will help with overall efficiency for other engines as well.

That is how I would go about it. The other part of this is how you can combine that with the hip topic of AI. We’ve talked a bit before about AI tooling on the content side, but there are also a whole lot of technical tasks like automating internal linking, redirect management, and URL mapping that you can do with the help of AI tools.”

Is it more about the quality of pages than the total number, and being honest with yourself about which pages Google would want to have in their index?

“Depending on things like how trusted your domain is and how old it is, you’re allowed more or fewer URLs in Google’s index. Generally speaking, though, you should look at the pages that actually provide value to users.

Information gain is incredibly important. There might be pages where there’s no information gain, like your privacy policy or a contact form. They don’t necessarily drive organic traffic so they are pages that might not necessarily need to go into the index. If you have a navigation set-up, certain types of categories don’t provide any value. These types of things.

Understand your inventory, and then ask yourself – for types of pages or types of templates – does it make sense that people would end up on those pages when they come in from Google? That is the approach. In that way, less is more.”

For healthy growth, is there a maximum percentage increase to your existing web pages that you would recommend per month?

“Considering the relation to your current content is a solid way of approaching it. If you have 100 pages and you start adding thousands more every month, that’s a spike that would not look natural, for many reasons. That might catch Google’s attention. You might pass a certain threshold, and then you’d be in trouble.

The bigger question is not so much about the number of pages but, for what you do create (whether it’s AI-assisted or human-created), does it actually provide value to the audience that you’re trying to attract? If that’s the case, I think you’re okay. If not, then that is the bigger problem.

It’s not so much about adding 10, 11, or 12 pages, but whether those pages bring value. Do they fit in contextually and from a semantics perspective? Do they provide value to the users that would end up on your site? I don’t think it’s necessarily about whether it’s AI or not.”

What does quality AI-assisted content look like?

“I don’t think it looks much different from content that’s purely human-made. If you look at it from a process standpoint, what querying a large language model does really well is shortcut parts of the research process. Obviously, you still need to fact-check, but it can be a shortcut with that.

Once you have done the research, it can give you a shortcut to turn the whole thing into a document outline, for example. It can even help you pre-draft certain sections of that content.

The major difference is between content that is purely AI versus AI-assisted. Assisted means that all the facts have been human-checked and quality-assured because there’s always stuff in there that’s definitely wrong. No matter how well you prompt, even if you are using a tone of voice and providing input samples, there will always be giveaways in the grammar and language it has come from a large language model. You need the human editing component as well.

The underlying problem remains the same, whether it’s done by a human or an AI, so it doesn’t look different in terms of output. If it doesn’t provide value, because it’s commodity content that’s already on the web, then it’s not worthwhile. ‘10 Travel Tips for Greece’ is something that Google can consolidate based on all the sources they have already vetted. If you go and curate that, whether you use AI or not, there’s no real information gain.

However, if you can add a very unique perspective on a certain destination in Greece, because you’re the only person talking about it who has been there recently, that’s an information gain. That’s the difference; does it actually provide value? It sounds abstract, but it’s a simple question.

Also, when you assess these things – when you’re doing consolidation, cleanups, and merging things – you need to be very thorough and very honest with yourself. There will be bits and pieces that are older, outdated, and not relevant anymore. They don’t provide value, so they can go.”

Is there a particular AI tool that you recommend for content creation at the moment, and is there a process that works best?

“From a tooling standpoint, it depends on language, but also style. None of the large language models can be used out of the box right now. That would just return standardised garbage. If you are more in the ChatGPT world, you want to use a custom GPT where you can pre-train it with the context and an example of what you expect it to produce. Provide it with input, but also example output for format, tone of voice, etc.

Custom GPTs are pretty cool for that because, when you share it between team members, they don’t have to redo these things a million times – where someone will do it differently and all of a sudden you would have a different output.

I do like to work with Claude quite a lot. They do a really good job. Oftentimes, even in the draft stage, it already sounds much more natural. There are a whole bunch of more specialist writing tools as well, so it really depends on the type of content you want to produce. If you start scaling, then you’re moving away from an interface anyway. You’re more on the API side.

We’ve been building a lot of stuff around the OpenAI API. You have things like prompt sequencing, and you have many more settings that you can play with to really come to the output that you want or need.

If you are using it to help with internal linking, then training your own model with your own content is probably the way to go – especially at enterprise scale. Default matching one piece of content to another is nice, but it’s not integrated enough. You will obviously produce new content, so how is that being interlinked into this whole thing?

For one of our clients, we created an integration with large language models. We dump in newly produced content and refreshed content as well. Then you have all the semantics in the model, and it can give you back the best possible link targets, one way or the other. It’s a living process, rather than a static one. We’re not just doing internal linking once and then it’s gone. Integrating it on the CMS side can make a big difference.”

If an SEO is struggling for time, what should they stop doing right now so they can spend more time doing what you suggest in 2025?

“A lot of the tasks that have historically been done manually or semi-manually, in Excel with different mini tools, can be now pre-produced in a large language model. Question your processes and your deliverables.

I’m not saying to stop one or the other. I’m saying you should try to squeeze for more efficiency in existing things. It depends on what your job is, but if you have been doing schema markup, meta descriptions, or page titles by hand, that is something that you can draft at scale.

The same is true for redirects. If you’re doing a migration and you have to map out the old URLs to new URLs, you will get 80-90% accuracy by letting the machine do the work, and then you check it. It’s much faster than starting from scratch and doing it all manually.”

Bastian Grimm is CEO at Peak Ace AG, and you can find him over at PeakAce.agency.

@basgr    

Also with Bastian Grimm

Bastian Grimm 2024 podcast cover with logo
SEO in 2024
Familiarise yourself with the capabilities and limitations of large language models

Just as we need to get into the nuts and bolts of SEO, we also need to take the time to understand the technology that powers AI, as Bastian Grimm from PeakAce explains.

Bastian Grimm 2023 podcast cover with logo
SEO in 2023
Get your head around AI

Bastian Grimm informs SEOs that you can no longer be ignorant about the way AI and machine learning are impacting the online environment in 2023. You need to start understanding now if you want to stay in front of the pack.

Bastian Grimm 2022 podcast cover with logo
SEO in 2022
Are you prepared for Platform-less, server-less SEO on the edge?

Bastian's tip follows on from Nick's, highlighting that Platform-less, server-less SEO has the potential to make several SEO operations much more efficient.

Choose Your Own Learning Style

Webinar iconVideo

If you like to get up-close with your favourite SEO experts, these one-to-one interviews might just be for you.

Watch all of our episodes, FREE, on our dedicated SEO in 2025 playlist.

youtube Playlist Icon

Podcast iconPodcast

Maybe you are more of a listener than a watcher, or prefer to learn while you commute.

SEO in 2025 is available now via all the usual podcast platforms

Spotify Apple Podcasts Audible

Book iconBook

This is our favourite. Sometimes it's better to sit and relax with a nice book.

The best of our range of interviews is available right now as a physical copy and eBook.

Amazon US Amazon UK

Don't miss out

Opt-in to receive email updates.

It's the fastest way to find out more about SEO in 2025.


Could we improve this page for you? Please tell us

Fresh Index

Unique URLs crawled 338,790,096,081
Unique URLs found 896,938,559,572
Date range 02 Mar 2025 to 30 Jun 2025
Last updated 8 hours 24 minutes ago

Historic Index

Unique URLs crawled 4,502,566,935,407
Unique URLs found 21,743,308,221,308
Date range 06 Jun 2006 to 26 Mar 2024
Last updated 03 May 2024

SOCIAL

  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • Twitter / X
  • Blog External Link

COMPANY

  • Flow Metric Scores
  • About
  • Terms and Conditions
  • Privacy Policy
  • GDPR
  • Contact Us

TOOLS

  • Plans & Pricing
  • Site Explorer
  • Compare Domains
  • Bulk Backlinks
  • Search Explorer
  • Developer API External Link

MAJESTIC FOR

  • Link Context
  • Backlink Checker
  • SEO Professionals
  • Media Analysts
  • Influencer Discovery
  • Enterprise External Link

PODCASTS & PUBLICATIONS

  • The Majestic SEO Podcast
  • SEO in 2025
  • SEO in 2024
  • SEO in 2023
  • SEO in 2022
  • All Podcasts
top ^