Learn how to produce at scale with generative AI intelligently and methodically
Ben says: “Don’t be put off by ChatGPT horror stories. Embrace generative AI at scale for your SEO output – but do it intelligently and methodically.
LinkedIn is awash with horror stories about folk who jumped on generative AI early and used it to create content and other SEO assets at scale. Then, they were hit with Google’s helpful content updates, or they disappeared from search results entirely through new algorithmic changes that are better able to recognise machine-generated content.”
Are these websites that are creating hundreds of thousands of pages with AI or people with genuine intent who are only using it on a few pages?
“The examples that are shared most often are people who’ve deliberately tried to game the system by generating huge amounts of automated content.
By being early adopters, they weren’t necessarily doing it in a particularly evolved way. In retrospect, although they were early adopters and they were trying out a new technology, it looks like ‘suck it and see’ wasn’t the right thing to do.
There may well be more innocent, small-scale cases, but those people might not be in the SEO community because they’re not doing it in a way that is deliberately trying to monetise SEO. They might not even be aware that their efforts are undermining their performance.”
What does using AI correctly at scale look like?
“Be smart and prime generative AI in a sensible way that adds value. A lot of the early adopters were using simple prompts and giving the AI too much freedom. What I’ve found to be successful has involved priming generative AI with its own corpus of information in advance.
For example, you can use a crawler to extract elements of websites and then use an AI to isolate, from that bulk scraping, the relevant bits of information and the relevant facts, and combine it all to add value that’s hard to find elsewhere.
Then, you can have a second AI generate fresh content out of this corpus of material that you have curated. The difference is that you’re not just asking generative AI to produce something out of nowhere. You’re using your technical expertise to scrape information that’s publicly available, and then have AI extract only what’s relevant and what you want it to draw on, and finally have a second AI produce information from that.
That gives you much more control over the inputs that generative AI will draw on. It allows you to ensure the quality and accuracy of the information and have much tighter control over content structure. Overall, it’s a human-curated input but without the human hours put into it.”
With the initial data, are you only using private resources like PDFs that have been produced by a company but not released to the general public?
“It depends what LLMs you have at your disposal. Do not put sensitive in-house documentation into ChatGPT, for example. In most use cases, there are ample publicly available datasets that folk have not previously found it easy to extract information from, and that the OpenAI API will help you leverage.
A content format that this process works really well for is comparison content. Here’s a fictitious example of a car retailer.
You are retailing cars, and there are datasets out there in the form of product spec sheets. Previously, you might have used a search engine crawler to extract certain parts of information on a page, but it’s fiddly because, if a page template changes, you can’t extract the information very well. You spend a lot of time troubleshooting, and there’s user error at the source where invalid information has been put in incorrect fields in the first place, which makes your extraction less worthwhile.
But see how there are two ways that I’ve suggested using AI. One of these is to extract information in the first place. Practically, if you want to produce editorial content comparing two vehicles, you can use an off-the-shelf SEO crawler to broadly extract all content in the body of a page. Then, you prime AI to extract the bits of information from within this greater body that are the most useful to you: engine size, colour, year, etc.
You can train the AI to recognise what different bits of information look like. You can train it to recognise an engine format or a year format, and then get it to extract these bits of information from the page. Before you know it, you can accumulate a dataset from numerous different datasets and use AI to extract those bits of information. By doing that, you’ve accumulated a unique combination of information from multiple different sources, and you’ve had tight control over that input quality.
If you then use a second AI to generate a piece of content in a prescribed format – using only that information you’ve primed a previous AI with – you can guarantee that the information is of high quality and in a prescribed structure. You will get a much more successful output than the early adopters who would have used AI that was drawing on nothing in particular.”
Why do you use multiple AIs for these two tasks?
“Imagine you are using the OpenAI API (which is accessible to anyone for a very small fee). With this, one AI is trained on data extraction, based on picking out bits of information from publicly available datasets where that information follows certain formats or meets certain criteria. The other one is trained on editorial execution according to SEO best practice by giving it basic universal brand guidelines, which is a harder skill to execute.
Because the two use cases for the OpeanAI API are so different, I found it valuable to separate them completely. Two specialist use cases, two separately trained models. Otherwise, you’re layering on lots of different types of logic and purpose. Generally, one AI is less successful when it’s doing two separate things. What’s important is that we’ve achieved an output that’s both fully human-curated and completely automated.
It’s human-curated in that we, as people, have defined the types of information that need to go into this. We’ve also extracted that type of information, and we’ve organised it in a human-defined way.
Everything about it replicates a human-led process, you’re just automating the execution. It’s much more valuable and authentic than simply prompting a generative AI to write a piece of information for you.”
Where does this fit into a content marketing strategy and how does it impact the organisation as a whole?
“It is a really good fit for marketing-led initiatives that are focused on SEO or growth. It’s less attractive for brand-oriented projects. Organisations and marketers have a number of different reasons for creating content or publishing landing pages. Not all of them are going to be for growth in the way that SEO material is for growth - and that’s fine.
This AI-led system works really well for growth projects because, if you’re doing a project which requires executing a very large number of pieces in a very small number of repeatable formats, then a single round of bulk scraping, sorting, and asset execution is incredibly efficient and effective.
However, if you’re doing brand-led content then you may not need or want to create output that follows a repeatable format. Fully bespoke, non-scalable, template-resistant outputs might even be desirable for your campaign! Basically, projects not designed for scale do not suit automation in this way.
If you’re not sure whether your marketing output will suit this kind of AI-led system, you should encourage stakeholders to help you define what kind of data points might be useful for you, and what insights you might want to provide with your output.”
How do you decide which type of content you’re going to produce for a brand?
“For an initiative that this type of technology can support, it’s important not to just look at traditional things like total addressable search volumes. It’s more important than ever to look at the types of material that search engines are favouring for any given query format.
In my personal experience, it’s never been easier to see what type of material is likely to rank, if you properly examine the search results. Even as recently as 4/5 years ago, search engines would provide more of a variety of types of material for a given search result. Now, it’s more monolithic.”
Do you use software to analyse the SERP and determine what type of content is likely to be the most successful?
“A number of different tools will try and categorise a query as informational, navigational, purchase intent, etc. – and that can give you great inferences on what format of content you ought to be producing. Also, by extension, that will tell you whether you can leverage the technology I’m describing here, because of the type of format that seems to be rewarded.
However, I find all the most popular off-the-shelf tools a little bit hit-and-miss in this regard. A helpful approach would be to treat the categorisations or intent assumptions of tool sets as a starting point, but then validate that through manual checks for the most valuable keyword format types for your business.
By eyeballing that, you can verify immediately how many of the top 10 results for that query are met with X or Y types of content. It could be that Y content type lends itself to the automation processes that I’m describing here. That’s the sort of logic tree you can go through.”
If an SEO is struggling for time, what should they stop doing right now so they can spend more time doing what you suggest in 2025?
“Stop being drawn into busy work – the work that takes a lot of time and has a high perceived value but doesn’t affect the bottom line.
For example, I have to show great discipline with which tool regression alerts I follow up on every day. As anybody reading this would know, we’re subscribed to so many different tool sets to support our online growth efforts. It’s very sensible to set up notifications for changes, issues, opportunities, etc. - but my inbox is flooded with these on a daily basis.
If I were to follow up on every instance of regression, I would have no time to pursue more innovative tactics like those which I’m describing. We should be disciplined with what kind of regression to follow up on and what to put in a backlog to action in two years’ time.”
Ben Howe is Global Web and Digital Lead at AXA - Global Healthcare. You can find him on LinkedIn: linkedin.com/in/ben-h-5710195a.
Ben’s views represent direct first-hand experience, but do not reflect practice or policy at AXA.