How to find SEO opportunities using the Crawl Stats report in Google Search Console

Katherine Nwanorue

Katherine Nwanorue shares that the Crawl Stats report on Google Search Console isn’t really a much talked about report, but you can use it to really find insights that can help you optimize crawling and search performance.

More Additional Insights

Katherine Nwanorue says “My additional insight is why the Crawl Stats report on Google Search Console is not really a much talked about report, but you can really find insights that can help you optimize crawling and search performance. So if you don't have access your website's log files, then you should look to this Crawl Stats report for additional opportunities or similar places where you can optimize your website.”

Okay, so why is this so important? And how often should SEOs be looking at these reports?

“Okay, so this is really important, because let's say for instance, you have a really large website, so you have no idea how much of these pages Googlebot’s are crawling in a day or session, and you would like to know where they're accessing and how to optimize this so your valuable pages are crawled. That's where the Crawl Stats report comes in. You can use it to find issues with your website server errors, you can use it to find areas of your website that are having issues with being crawled, or you can also use it to find file types you need Google bots particularly to access on a website. How often should you check it? It’s not something you really need to check every week, so if it's okay with your site or if you add lots of pages regularly, then you should do it monthly.”

SEOs should check out these reports because they might not necessarily have access to log files. So why typically might SEOs not have access to log files, and are log files always better?

“Log files are better for finding information on all the search engines that are crawling your website, not just Google, as their Crawl Stats report will show you information just on Google’s crawling behavior. You might not have access your websites log files if you working for a large organization that are really, really possessive about their stuff, right, so they might not be willing to grant you immediate access to those log files, it might take a while. And in some instances, if you really need to optimize something really quickly, you might not have three months, or even two months to wait before accessing those files. Or maybe you're not the developer type, or you don't have any developer skills to know how to access your website's server, or the log files on your website. So in this case, the Crawl Stats report is a really good alternative.”

There are two main sections in the Crawl Stats report. There's the ‘host’ section, and the ‘crawl stats’, drill down sections. Starting with the ‘hosts’ section, what can you find in there?

“There are three main things to look out for in this section. First you’ll typically see the robots.txt availability information. So here you see if maybe your robots.txt file in the past hasn't really been accessible, or it doesn't really return an acceptable response – so a 200 response status is good, it means it exists, but a 410, a 401, or a 403 means it doesn't exist. It’s not mandatory to have a robots.txt file though. So whether you have one or not, it’s fine, but if you do have one then as long as it doesn’t respond with a 429 or a 503 server error, then you don't have a problem. If you do find a response with these two errors, then it could really affect how often your site is being crawled. So you have to look out for this.

So another thing to look out for is your DNS resolution. If you're having issues with your domain name or stuff like that and you really have no idea what is going on, you can check this part of the Crawl Stats report and find insights on if your domain provider are having issues, or if everything is not really set up properly

Then the third information you'd find in this section is server connectivity. Server errors can really mess up your website's growth and it slow you down for a period of time. So if you're having server errors, then it's best to speak with your developer or with your provider to get this sorted out as fast as possible.”

This is great advice because I'm sure that many SEOs are a little bit guilty of focusing a little bit too much or all their time on front end, and not realizing the back end issues that can impact If their success, and if things like DNS resolution, knowing if there are any issues with the way that your domain is set up and where it's pointing, can obviously have a massive impact. How often should SEOs be checking on these kinds of areas?

“The Crawl Stats report covers the last 90 days. If you have other third party tools you're using to monitor your website as well, then you can get away with checking this report every two weeks or so. Two weeks is a realistic time, because not everyone has the resources or the time to check out this stuff.”

Is there any way of setting up some kind of automatic checking system to receive an email if some issue crops up so you can go in and sort it?

“In most cases you’ll receive a notification from Google. However, I have noticed that the hosts section in particular is not really reliable, specifically receiving notifications on that section of your Google Search Console. In the past, I've worked on a website where there were issues with the host, for a couple of days, and we didn't receive any notification for this, so we had to actually check before we found out that this was an issue. So yes, the only way to receive those notifications automatically is if you get an email from Google. Aside from that you could depend on third party tools. I think Content King has something that helps you monitor your websites on real time, but I'm not sure if they have a hosts section.”

And the second part of the Crawl Stats data report is the Crawl Stats drill down. So that section includes crawl data and crawl data requests by four different items. Starting off with number one, ‘HTTP response’.

“So just like the name suggests, you'll find a list of crawled URLs and their response codes. You don't want search engines or Googlebot, in this case, to spend all their time crawling pages that don't respond with a 200 status code, or in some cases, such as a site migration, then you would also want them to crawl those redirected pages to the new domain. If, for instance, you view this section and you find that maybe 20% of your pages are returning a 503 server error code, then you might have a big problem on your hand, because it could really mess up how many pages are being crawled on your website. Another insight you can draw from this section is, maybe you have pages you removed a while back, but for some reason crawlers are still hitting these pages, so this is really a time for you to review these pages, go down and check them in Google Analytics to find if users are also accessing the same pages, or check their backlink profiles on third party tools to find out if they still have backlinks. So if this page is still being hit by crawlers, users are accessing it, and they have backlinks that were really difficult to acquire and are really valuable, then it might make sense to redirect them to the page that delivers the most relevant experience to users.”

Next is ‘file type’. So why is ‘file type’ important?

“Okay, so this depends on the type of resources you rely on for your website, right? So let's say you have an eCommerce website, or a photographic website, that relies so much on images and images make up a huge amount of your search performance, and you go to the Crawl Stats reports by file type and you find out that just HTML pages are being crawled more than your images, then you might need to check out was wrong. Are your images added to large? Are they slowing down the site? are they slowing down the server so much that search engines can't access them? Or maybe they're not so optimized and there might be something wrong with these images? Or we can say the reverse is the case, and you rely heavily on JavaScript, and instead crawlers are accessing your HTML pages, then you might need to find out if you have issues with your JavaScript, like maybe you block them with robots.txt file, maybe they are faulty, they're not implemented correctly. So this could give you an idea of the page resources they are crawling and how you can best optimize this to improve your search performance”

That brings us to the third crawl requests type ‘purpose’. So what do we mean by ‘purpose’?

“Purpose is classified into two major areas - Discovery and Refresh. So for Discovery, we are talking about new pages that Googlebot has not crawled before. On the Refresh parts, we are talking about re-crawling known URLs. Just like in the other sections I mentioned, the insights you can find on this section depends on your needs. If you have a website where you add so many new pages, daily or weekly, then you would definitely want your new pages to be found, so your discovery percentage should ideally be higher than your refresh percentages. But on the other hand, if you don't really add pages on your website, and maybe it's during a specific period, so you decide that you want to refresh like 100 pages on my website, and after refreshing this for a while, you still find out that nothing is really moving in the refresh aspect, then you might really need to optimize a website, you might need to optimize your internal linking, you might need to optimize your site's structure, or maybe for some reason, you have orphaned pages and these are so deep, that crawlers can’t access them.”

And finally, ‘Googlebots’. So what do we need to know about Googlebots, and what Googlebots do?

“Okay, this is a bit similar to the File Type section and you’ll find crawl requests based on the Googlebot that visited your page within a specific time. So you’ll have smartphone bots, image bots, or video bots, and you can use this to understand the particular search engine bots that are visiting your pages. So if image bots are visiting your sites a lot, then that might be a good thing if you have so many images. Whereas on the other hand, there is this ‘page resource’ bot that accesses your scripts and if they're accessing your scripts and leaving the images you still want them to find, then you might still have an issue at hand.”

Should every SEO should be diving into this kind of information? Do you think that there should be different types of SEOs and it's okay for some SEOs to be looking at the back end? Or do you think every single SEO, even if they’re quite strategic and creative, needs to know the information that you've just shared?

“It depends. If you're someone that works more on the content side or on the strategy side, then maybe the technical side might not be your forte, or your main strong point. So in this case, it would make sense if you have someone on your team that is really comfortable with the technical side so they can draw these insights, and you can add it to your strategy. But if you're really comfortable with the technical side, and you currently draw these insights and interpret what all this means, then yes, it does make sense to pay attention to it.” So you've shared what SEO should be doing in 2023. So now let's talk about what else you shouldn't be doing. So what's something that's seductive in terms of time, but ultimately counterproductive? What's something that SEOs shouldn't be doing in 2023?

“Okay, I might be biased, because I have it in with 404 pages, but I'd say it’s implementing blanket redirects to the homepage. So 404 pages are a normal part of the web. But for some reason, most people when they remove a page, they redirect these pages to the homepage. But that's not really a good idea, because like John Mueller has said in the past, you could lose signals associated with these pages if you keep redirecting all the pages you remove back to the homepage. It could also trigger a soft 404 error. So in this case, it would make better sense to review these pages. So you have to make sure if this page is removed, but for some reason it’s still a bit valuable to us because users and crawlers are trying to access it, and you have backlinks to it that are really valuable, then it would make sense to redirect this to the most relevant page you have. So this could be a category page, a close substitute, or another page that matches the same intent and user experience. But redirecting all these pages to your home page, yes, it saves time because there are plugins that can do it for you and you can automate it so every removed page goes directly to your home page, but it is not the best approach.”

Katherine Nwanorue is a junior SEO at Fusion Inbound and you can find her over at fusioninbound.com

Also with Katherine Nwanorue

SEO in 2025	Look beyond the on-page to keep up with the evolution of search marketing There’s a whole world outside the Google bubble and, to get out there, SEO Specialist Katherine Nwanorue wants you to expand your horizons beyond what’s on the page.
SEO in 2024	Take your internal linking a step further by linking to subcategories from the parent category page Guiding you even further down the internal linking journey, Katherine Nwanorue from Fusion Inbound has a very particular tip that could change your perspective on your subcategory pages.
Majestic SEO Podcast	#48: Setting an SEO Strategy for 2024 Katherine Nwanorue, Gianluca Fiorelli, Kevin Indig and Gemma Fontané discuss how they pull together an SEO strategy and how you should too.