Josh says: “With the incredibly quick adoption of AI-driven results in search engine results, it's more important than ever for people to take a look at how they're showing up in Wikipedia because of the huge influence that it's having on the results that people are seeing today.”
We were talking about the subject beforehand and you also mentioned Wikidata. Is Wikidata something that supplies Wikipedia, or is it something a little bit different?
“It's one of the many Wikimedia properties and it's a foundation for a lot of information in Wikipedia; for example, Wikidata is designed to be just definitive facts like the earnings for a company in 2023, the official name of a product, or the location of a building. That then gets syndicated, essentially, across all sorts of interesting places.
Given all the different approaches all the LLMs are taking towards when they're getting data, what the cut-offs are is important. That's pretty insightful in that you need to look at that and make sure all that information is accurate because AI makes things a little less transparent than search results in terms of where the information is coming from. It's important to backtrack and reverse engineer where some of these things you're seeing in results are coming from.”
Is it true that a brand should be updating and editing its own Wikidata?
“Their data should be updated. There are various ways to do that, and the Wikipedia/Wikimedia community is a little bit tricky in terms of that, but the short answer is yes.
The mechanics of that can vary from company to company, depending on their situation, but in general you want to ensure that it has the most accurate and up-to-date information possible.”
I thought that perhaps a brand could update its own Wikidata, but perhaps not its own Wikipedia page.
“Yeah, it tends to work the same way. There's a robust mechanism for providing feedback if you're a brand to say this information needs to be updated, and that's usually the path brands will go down just because updating data tends to be much less controversial than things that are more opinion-based.”
How does Wikipedia and Wikidata impact search results?
“Because they are copyright-free, essentially they have a Creative Commons license, which means they can be used a lot more freely than other information. When you see OpenAI paying a large amount of money to use Reddit content to train models, they don't have the same restrictions as Wikipedia. So, you're seeing it being used in more and more places because it is one of the biggest sources of information and it doesn't have a lot of restrictions that other communities might have. You’ll see it pop up in very nonintuitive ways like TikTok, for example, which relies on Wikipedia data and certain circumstances to define things.”
I thought that perhaps Wikipedia was becoming slightly less important because when you search for something nowadays, I think you're feeling slightly less likely to see your Wikipedia results straight away, especially as Google has established a partnership with Reddit, and they're trying to bring in other sources of data providers. Is that fair presumption of fair analysis?
“I would say that I intuitively feel the same thing: Wikipedia has a little bit less high-profile visibility. I would guess if anyone had a way of measuring it. However, that lessening of visibility is probably offset by the fact those Wikipedia pieces of information are now showing up in Google's SGE or other places.
The Reddit ‘bump’ may not have as much in the way of legs given. People have observed that the search experience may not be great if you get just Reddit results, but I think the original idea made sense - let's get some first-person experiences. But if you're getting a Reddit thread from a decade ago, or even a coupon code from three years ago, that's not a great search experience.
You can argue that it meets the criteria of a first-person experience, but there's a lot of upheaval going on. It'll be interesting six months from now to see which of those things have stuck and which haven't.”
Perhaps it's just the case that Google is actually using Wikipedia and Wikidata just as much, but they're not giving them as much credit as they used to.
“I think that's probably accurate and we're going to see that continue evolving. I think Google has more pressure on it now than it's had at any time in the last decade with how that manifests itself in terms of what it's serving, and the results will be fascinating.”
You touched upon SGE there. How does the new AI world use Wikipedia and Wikidata?
“It seems to be a frequently cited source in terms of helping generate those answers, either when the SGE results are actually citing their sources, or we've done a number of audits and you're seeing that that information is being pulled from Wikipedia or being synthesized based on Wikipedia, whether or not there is actually credit given sometimes.
I think that's because there's a feeling that Wikipedia can be a source of truth for all these engines. If you're training them, you can assume that Wikipedia's reliability is very high, and other sources are much more challenging to determine if you're trying to program something on the back end.”
I guess it's quite tough to find a competing source of data or a competitor to Wikipedia for Google to provide an alternative answer to its users.
“The reason Wikipedia has made it 20 years is because it occupies this unique niche. I think that's going to be increasingly important if you were thinking about what to do to train your LLMs. Sure, I can feed Reddit into it, but there will be a certain amount of garbage which will increase over time as people realize the influence Reddit is having versus Wikipedia, which has a pretty good filtering mechanism for filtering out marketing spam. So, seeing what sources become reliable for these SGEs will be interesting.
However, I do think you're going to start to see pushback. We’re already seeing it with Reddit, where Reddit went from this cool, quirky community where you could get first-hand information to a situation where they've probably gotten more negative press in the last six months than they've had in a long time just because of the Google thing, right?
There's always been a certain amount of crap on Reddit, which didn't matter before because for the most page, no one was searching 10-year-old Reddit threads. That's not how people used it. But now that they're showing up on Google, I think it's hurting Reddit's brand. I suspect we're going to see a lot of evolution of this.”
Do you have any opinion on whether or not users tend to focus on that singular SGE result and take that result as the definitive answer for the query that they're searching for?
“I think it's like a lot of other searches, where if you're looking for a quick answer, it's helpful, like a lot of the things where Google had zero click results, such as if I'm looking for the score of a ballgame or the weather tomorrow. An SGE answer there would be fine with me.
However, I think people still tend to keep scrolling because there's not quite that trust level yet, especially with people who are using it on a day-to-day basis. I think that my parents are probably scrolling past and they're going right to the traditional results unless there are some cases where it's helpful.
You're currently trying to shift almost a quarter century of search behavior, which can be done, but I think it's going to be a more gradual process.”
In terms of the fields and the data that you should be optimizing Wikidata and Wikipedia for, what are the elements you're seeing brands missing out or not optimizing for?
“There are two common things for brands. The first having lots of facts and figures about their company which are several years out of date for whatever reason.
The other thing is listing products or product features that are two or three years out of date. That's going to hurt because that's something that people are just going to say, Oh, it doesn't have this feature, or they only have a few products in this area.
If this information is flat-out wrong or outdated, but it's being presented as if it's new, I think that will have a negative impact.”
Is once a year enough for a brand to review these listings regularly?
“I'd probably say quarterly or twice a year. It's not a huge lift unless you're just a giant company. You can really go through a few of the different search engines and a few of the different LLM models and just see if anything is glaring that stands out to you in the SEO space. You're probably very aware of what you're trying to highlight and probably spend a lot of time researching what people are looking for information about. So, try to reverse engineer that and ensure the information's up to date.”
Are there any other Wiki sites that SEO should be aware of that should be optimized, or are Wikidata and Wikipedia just the Goliath in the room?
“There are many different languages, so depending on how international your company is you probably want to look across all of those to see what's showing up, as they are often less frequently updated.
While the English language Wikipedia is the Goliath, if you're operating in other countries, it's also good to have maybe a native speaker look at those pages and see if they differ materially. Some are very well done and updated, while others are very loosey-goosey. So, it's worth checking.”
Does this relate to Schema Markup as well? Wikipedia is one source of a search engine understanding and getting brand data. But another source is a search engine crawling your own site; and schema can help search engines understand that. Do you somehow link what's happening with the markup schema code on your site with what's happening on Wikipedia?
“The more you have your information well organized and have schema set up, the more beneficial it's going to be in many different ways, especially since some Wikipedia editors are just going to look at your site for the source of truth. If you've got the schema there, all the various tools and crawlers will make it much easier by having organized information. It goes hand in hand that having that will make life considerably easier in various ways.”
You've shared what SEO should be doing in 2024, but what's something that SEO shouldn't be doing in 2024?
“I'm guilty of constantly checking results from your actions. Are the search rankings changing? Have favorite queries changed? Etc.
Instead, focusing on the process and not necessarily the output can save a lot of time that's not doing anything productive beyond looking for a small dopamine hit of a change you may have made.”
Josh Greene is CEO of The Mather Group, and you can find him at themathergroupllc.com.