Andreas Voniatis says: “So many SEOs make decisions based on averages, but they don't look at the variation, and being that SEO is data rich, SEOs should take a statistical approach to SEO. And this is all detailed in my book, Data Driven SEO.”
Many SEOs say that they're data driven, and I'm sure that it would mean different things to different people. So what do does Data Driven mean to you?
“I’ll start with what it shouldn't mean, and at the very basic level, it doesn't mean data entry. What it does mean is not just looking at the average, but looking at the variation. So in Cisco speak, that includes the standard deviation, because averages can be highly deceiving. Obviously there are situations where you get data from Google Search Console, and you have no choice but to analyze your existing data. But you can still do things like looking at the average, looking at the standard deviation. This is all theoretical, but to give you a more practical, exciting example, is that if you did track the standard deviation of your annual competitor rankings, you can see immediately whether your position changes are down to a Google algorithm update, competitor action or otherwise”
You use the phrase standard deviation a few times there, and I'm sure many of us will have heard of that, but not necessarily thought about what it actually meant in terms of its practical use in SEO. So thanks to my friend, Google, “standard deviation is a measure of how dispersed the data is in relation to the mean”. So it obviously means that you're looking at different results that deviate more than from what you would expect the average to be. So how do you actually use that in things like competitor analysis, or discovering why rankings have altered significantly?
“Well, SEO is probably are using it more than they think like, for example, if you look at the split A/B platforms that have been popularized by people like SearchPilot, it uses the standard deviation. In fact, if impressions or traffic is more than two standard deviations above what it was before, then the split test result is significant. And there's many other applications, for example, the average tells you what's normal, but if something goes, one or two standard deviations above or below what was normal, then you know, something significant has happened. And that's why I really cannot start advocating the need for data science to really find out what's going on in your SEO.”
So I think you alluded to the fact that you can tell by the amount of standard deviation, what was likely to have resulted in the significant change. So how can you do that?
“Well, what you need to do is you need to look at your existing data, and you need to look at the average, and you need to look at the standard deviation, which tells you how consistent the data is, how consistent is it around the average. Is it quite close, ie a low standard deviation? Or is it widely dispersed? A very, very high standard deviation. So obviously, there's formulas online in Google, and if you use Python, you can code formulas to actually get those figures. By tracking these values over time, whether it's the number of internal links or your ranks versus your competitors, things like that, this can all be automated. By tracking over time, you can see actually, whether it's changing. So for example, if you're doing a split test, you’ve got a test control, and you want to see whether the tests is actually giving a significant impact, ie, the experiment is working or totally tanking, then what you would do is look to see whether the averages of the test control are different, but most importantly, you want to make sure that the difference in the averages are more than two standard deviations apart, whatever that figure is that you calculate it beforehand.”
So is relatively manual data science not going to get surpassed by AI. So SEOs aren't going to have to know about these things in the future?
“I think in terms of split testing applications, possibly, but I think AI will basically automate all the tasks that we're not built to do. Humans are creative. So I don't think AI is going to get rid of SEOs anytime soon. I think what it is going to do, it's going to force us to do more of the things we're built to do, which is create solutions and come up with ideas and understand and connect SEO to business outcomes and things like that. That's something that AI is not going to do very well, anytime soon. Even when you look at ChatGPT and how it's being used by SEOs, ironically, to create content, ChatGPT tends to come out with a load of generic stuff that probably wouldn't rank on the first page, and in some cases, is factually incorrect.”
So staying on being more creative as SEOs and using your human brain to decide on probably what is the most appropriate thing to do from a strategic or business perspective. So from that lens, shall we look at content marketing? So how can data science really help to funnel great decision making when it comes to content marketing?
“Well, SEOs are already looking at the top 10 SERPs. They're looking at the sections that are common to the top 10 ranking content, when they want to find out what it takes to rank page one for a target search query. Where data science can help is it can actually automate a lot of this, when it comes to trying to understand the search intent of keywords, there's already tools out there that compare the search engine results for a couple of keywords, what data science can do, it can make this a lot more consistent and reliable. Because if you imagine if you're trying to compare or cluster 100 keywords by search intent, ie by the search engine results, it's going to be very manual and error prone if you tried to do it yourself. Whereas what data science can do is it could do it at scale, and in an automated way that a human or an SEO consultant can possibly hope to do in say, half a day or less.”
So obviously Majestic has a lot of great data from a link perspective. So how do you use a data science driven approach to harness all this information and actually build an optimal link building strategy from all this data?
“So in my book, one of the chapters is on authority. One of the things that covers is ‘what makes a good link?’ So using Majestic data, you can absolutely apply data science techniques to find out what would be a good value the Trust Flow, you could find out the average Trust Flow, the standard deviation of that Trust Flow for your sector, and then if we say that the average Trust Flow is 36, and the standard deviation is 10, then you know that if Google was to find a backlink with say, two standard deviations above, say 55 or above, then it's very likely that Google would think that that link is valuable, and then you could use your Majestic data to look for links that have a Trust Flow of 55 or above.”
Is there any particular software that you would recommend, apart from Majestic for doing great analysis of not only reviewing links, but other data that you can use to power your SEO?
“I'm probably the wrong person to ask on software because I'm one of these crazy people that code everything, and their own tools.”
Any data science tools that you would recommend?
“Well, I mean, if you want to get started in Data Science, you would want to have a Jupyter Notebook. This is this is all Open Source, and it runs on any machine, Mac, or Windows. And Python, which again, is Open Source. It's really just doing the hard yards of taking a basic course in Python, and in Pandas, which is the library that is used to analyze data in Python. And also, obviously, if you can take a basic course and statistics to understand how statistical tests work and things like that, that will really give you a new way of seeing SEO and all the possibilities that it opens up for you.”
If you were to sit down with an SEO, who hadn't really thought much about Data Science in the past, are there three key areas of Data Science that you would actually pinpoint as areas that they need to begin with so they can understand the basics?
“A lot of the SEO teams that I train, they tend to want to get the code now to do certain tasks. So I would say the top three are Keyword Clustering, Forecasting Search Volume demands or traffic, and split A/B tests. Those are incredibly popular with SEOs wanting to move into Python.”
Keyword Clustering is understanding the related keyword phrases and which keyword phrases you should be optimizing for the same page?
“Yes, indeed.”
Well, you’ve shared what SEOs should be doing in 2023. So now let's talk about what SEOs shouldn't be doing. What's something that's seductive in terms of time, but ultimately counterproductive? What's something that SEOs shouldn’t be doing in 2023?
“What they shouldn't be doing, and I've been guilty of that myself, many years ago, is making decisions purely on averages, because the average can be highly deceitful.”
So I understand this is all relating back to Data Science, so in terms of fluctuations, I just want to dig into this a little bit deeper, because obviously many things can impact fluctuations in data (seasonality, popularity, Google algorithm updates, etc). The nice thing about looking at averages is that you don't get swayed by sudden changes in standard deviation. So how do you know which sudden changes to look out for and to pay attention to, and which is something that happens to be a one off and to not be that concerned about?
“I guess if you're pressed for time, in terms of doing all the number crunching, including statistical measures, such as standard deviation, then it really comes down to experience and knowing your data. So for example, if you know the average is say, 35, and you know it tends to never go outside the sort of range between 30 and 40, and then you see an average kick out at 42, then you know something significant has happened. That's if you don't want to do the homework.”
Okay, by the sound of it and you certainly, always make sure that you have to do the homework.
“For sure. We've got the tools now and they're free, so there really is no excuse. I'm not saying don't be creative. Be creative, ask questions of the data, but do the work.”
Andreas Voniatis is founder at Artois and you can find them over artios.io.