Test smarter and with more frequency
Emily says: “Building on my tip from last year, SEOs need to test smarter and with more frequency. Also, when you are testing, you should definitely still be working with your product teams - that’s even more true than it was last year. Get testing with those product teams and be doing it more efficiently and more often.”
What SEO testing activities should be happening that often aren’t?
“The main thing is that you should be talking with your product teams. I should caveat what I’m saying with the fact that we work with enterprise companies, so we do enterprise testing and they can test at a much bigger scale than smaller websites.
Working with the product teams is so important because we’re moving into the world of page experience, and user experience is impacting SEO a lot more. Even without that, product teams are often the source of a lot of frustration for SEO teams. Product teams will roll out things like JavaScript frameworks that are running client-side, and they’re negatively impacting SEO.
Testing is a great way to make your business case and show them how their actions might be harming organic traffic. Show them that you’re on the same team. Your product teams are also a source of great ideas and development on your website, so you should play to each other’s strengths.
There are a lot of different names for these teams: product teams, CRO teams, development teams - others might just call them their engineering team. They are the people that are designing what your website actually looks like for your users and how it functions. They also tend to be doing CRO testing, which is the thing that often comes into conflict with SEO testing.”
What do you mean by testing smarter?
“One aspect of that is that you can use smart technologies for fast testing, if you have the capability. That might be something like SearchPilot, which can actually change your website, or it could be trying to get set up on edge, if you have something like Cloudflare to be able to make agile changes to your website. It’s using technology to bucket your pages in a way that makes them really sensitive so you can get positive results for really small incremental changes.
First, you split up your pages into statistically similar buckets. That might be taking 100 pages - 50 are control and 50 are variant - and splitting them up not just in equal traffic, but in similar traffic patterns. Then, you can detect things like 3% uplifts that come from very minor changes that you’re making on your website. For enterprise customers, that can be a really big deal. That’s all coming from smart bucketing and modelling.
Having things like edge or SearchPilot, or a way to change your website really quickly, means you can do lots of different tests in a short space of time. A good example is something like title tag tests. What we do with a lot of customers is we come up with 10 different title tag formats that we want to test, and we churn them out really quickly. Often, we’ll land on one that’s positive.
Particularly now, with Google overriding title tags, you can’t easily predict what’s going to work. You can’t predict how Google is going to reformat your title based on the changes that you made, what parts it’s going to crop out, or if it’s even gonna show up at all. You should be throwing tonnes of different things at it, and you need to have the setup to do that at scale.”
For smart bucketing and modelling, does SearchPilot and/or edge SEO allow you to test statistically similar pages and then implement the tests on a set number of pages?
“You need two different things. You need the technology to create the buckets and the model, and then you need a separate technology to make the change to the various pages. The meta CMS edge technology is what’s going to change your variant pages. Then, you need some sort of analytic tool to help you make statistically similar buckets, and also model the impact of the change while your test is running.
At SearchPilot (with the caveat that this is for enterprise websites) we have a neural network model. We use machine learning technology to do this, which means we can create buckets based on patterns that are machine learning in real time to try and create statistically similar buckets. It’s constantly changing things to figure out exactly what works best at a scale that you just can’t do with humans.
Then, there’s also the ability to forecast the variant sessions. With forecasting, you’re measuring against the same thing, rather than measuring against control pages (which is going to introduce a lot of bias because you never perfectly split those things up).”
How could you implement this for a relatively small website, perhaps one that’s a few hundred pages and WordPress-based?
“There are a lot of different tools online for doing that. There is paid software that’s a bit cheaper, and there are blog posts on how to create buckets and run a causal impact analysis, for example. All of that is going to be better than what we have done historically in SEO, which is: we make a change, we see the graph go up after that date, so we call the change positive. You want to do some sort of controlled testing to get an idea of whether that traffic increase was because of an algorithm update or because of the change that you made. That’s what you want to get a better answer for.
It can seem like you’re losing something when you explain it in that way because people stop being able to point at graphs that go up and to the right and say, ‘This is because of the change we made.’ The selling point is that you are probably making changes that have positively impacted your traffic, but an algorithm update or something seasonal has happened, so it looks like your graph is negative. In reality, though, that’s not because of the thing that you did and you’re missing out on gains by not being able to separate all those things.
For smaller websites, it’s a bigger challenge because you’re working with less traffic and less data. It’s still worth doing, though. Something we do with customers that have lower traffic levels is just run the tests for longer. Running these tests for just a week is probably not going to be possible at low traffic levels. You want to be looking at something like six weeks. The key is to figure out how to get a controlled testing environment as much as you can.”
If you have a page that is performing reasonably well in terms of traffic, but you want to tweak the content, is it better to test those changes before implementing them?
“I think so. As an example from our own website, which is very low traffic relative to our customers, our CEO (Will Critchlow) did some basic keyword research for our Case Studies pages. He made some changes to the titles and H1s, and we could see that there was some improvement from that date, but he actually knows when he made the changes, and which pages he made them to.
Those improvements might still not be because of the change that we made but, once you at least know the best-case scenario, you can start to figure out what is possible within your website and how you can make changes to the best of your ability with the constraints that you have. That is going to work out better than just looking back and seeing if you made the correct change after the fact. Try to think ahead of time, in terms of what you can do and implementing as much as you can.”
How do you look at statistical significance, and how do you know if you’ve actually won with your new scenario?
“Our neural network model, as we call it, generates confidence intervals for us. They are the foundation when we’re talking about statistical significance. What the confidence intervals say is, ‘We are 95% confident that the change will fall between this lower end and this top end.’
Something that we’re all more familiar with would be the COVID vaccines, for example. For the COVID vaccine, they would have said something like, ‘We’re 95% confident (the scientific gold standard) that this is going to be effective from 75% to 95% efficacy rate’. They would have set a threshold that the bottom of that range needs to hit. Then we can say that, at worst, it’s going to be at that threshold and, at best, it’s much higher. That’s what we’re talking about when we’re talking about statistical significance. To translate that to SEO testing, we might say something like, ‘We’re 95% confident, this is going to be a plus 1% increase in organic traffic to a plus 15%’. That means we are very sure that it will be positive, but the improvement could be within that range.
We generate all of that with the neural network model, but there are other tools you could use as well. CausalImpact is something you can look into. You can input the data that you have, and it will generate those confidence intervals for you. It will help you with your business case conversations as well. If you go in just saying that top-end number, your boss is going to expect the highest possible increase, even if you have a really wide confidence range.”
If 95% statistical significance is the gold standard, do you stop running the test once you’ve achieved that?
“You want to leave it a couple of days to make sure it stays but, if it does, you can stop there. Going back to the idea of testing smarter - it’s important to remember that we’re doing business, not science. The 95% level was created by the scientific community, so it’s technically a bit arbitrary. They have good reasons to have that standard for something like a vaccine; if it goes wrong, it could kill someone. For our cases, something like 90% confidence or 80% confidence could be fine.
We talk about using a matrix of your decision-making, for both what the result says and how strong your hypothesis is. Say you’re running a test where you’re adding a really high search volume keyword to the title tags - which is a pretty well-established way to improve your rankings in SEO. If that was positive at the 90% confidence interval, you might decide that the odds are good and you should roll it out, rather than miss out on those gains because you’re aiming for a scientific gold standard.”
Should you always be testing or is this something that you should just do every few months or so?
“Something else that Will, our SEO, has been speaking a lot about lately is what he’s calling ‘Moneyball SEO’. It’s similar to how statistical analysis is used in professional sports. In basketball, for example, you are starting to see much fewer shots from the middle range – they are either from next to the rim or from the three-point line. From statistical analysis, they found that aiming for mid-range shots is not worth the two points that they give you, because the odds of making it are not as good as they are when you are close to the rim, and you get one less point than if you were shooting from beyond the three-point line.
Similarly with SEO, what we’re finding at SearchPilot is that a lot of ‘mid-range shots’ that we’re taking (like updating meta descriptions) are not really giving us much in return. Whereas all the things that we implement from positive tests, which are closer to the rim, are what you should be focusing on.
Focus on things that have evidence to say they will actually work, whether that’s from something like SearchPilot or because you’ve run your own testing methodology. Something we find tends to work, regardless of whether you’re testing or not, are things like new content and links. Those things do well for SEO and should always be part of your strategy.
When it comes to those mid-range things, if you can’t test them or find a good reason to actually implement them and invest resources in them, you shouldn’t be spending your time there. A lot of SEOs could benefit from thinking a bit more like that. If you can, you should be testing.”
What shouldn’t SEOs be doing in 2023? What’s seductive in terms of time, but ultimately counterproductive?
“I think SEOs should stop updating templated content. We see this on enterprise websites all the time. At some point, they generated a template and they spend a lot of time trying to improve that template across the whole website. What we’re seeing is that, very often, the templated content isn’t any better than the previous template content that they had.
If you add new content or really good quality content, then that tends to move the needle. Often that’s written by copywriters, or we’ve seen very advanced AI technologies that our customers work with create unique localised content. Replacing boilerplate content with new improved boilerplate content is something that doesn’t move the needle very often, but it’s very resource intensive for SEOs. On the other hand, just updating your title tags tends to make a big difference - though you should definitely test it because we see swings both ways.”
Emily Potter is Head of Customer Success at SearchPilot and you can find her over at @e_mpotter on Twitter.