The Knowledge Panel Course: Getting Your Entity Into Google’s Knowledge Vault
Script from the lesson The Knowledge Panel Course
Getting Your Entity Into Google’s Knowledge Vault
Jason Barnard speaking: Hi. As you have seen in previous lessons, Google has multiple Knowledge Graph Verticals including Google Books, Google Maps, and the Web Index. Its current problem is that these verticals struggle to talk to each other horizontally. They need to use a lookup table with only a small amount of information.
Jason Barnard speaking: Ultimately, Google needs to merge these Knowledge Verticals into one single Knowledge Vault, the core of Google’s knowledge. Now, because Google Business and Google Maps drive an entire ecosystem that is absolutely fundamental to their business model and their services, my educated guess is that this is the last that will be fully merged.
Jason Barnard speaking: Currently, educational establishments and tourist attractions enjoy a merged presence in both the Knowledge Vault and Knowledge Maps. This demonstrates that it can merge, but it is not in a position to do so at scale with any of the Knowledge Graph Verticals it has.
Jason Barnard speaking: So, whichever Knowledge Vertical you are dealing with today, the aim is to move the entity into the Knowledge Vault. At that point, your entity becomes universally accessible and usable by every algorithm, every vertical, and every service. The truth is that Google and indeed Bing are working to centralise knowledge and centralise the use of that knowledge for all their algorithms and all their products and services.
Jason Barnard speaking: The kgmid is vital here. It is the unique identifier Google uses for each entity, whatever Knowledge Vertical it is in. Whatever you hear anywhere else, this ID is absolutely key to Google. You must use it as a reference to your entity in your Schema Markup. There have been rumours that you shouldn’t use this as a reference. You definitely should, at least for Google. It doesn’t mean anything to the other bots, but to Google, it is a centralised reference to a named entity and the only way for all these verticals to communicate about that entity.
Jason Barnard speaking: A quick aside on deduplicating. Google often triggers multiple Knowledge Graph entries for the same entity. Generally, this is because the entity has been identified in different verticals, and so each one triggers a kgmid. Sometimes, but not often, one vertical will duplicate. You can vastly help Google deduplicate by adding both kgmids to the Schema Markup on the Entity Home. This allows it to reconcile the two in the lookup table, and then either it will delete one and keep the other or it will merge the two, as you can see here for my duplicates. Once Google trusts the Entity Home as the authority on the entity, then deduplicating really is as simple as adding the two references to your Schema on the Entity Home.
Jason Barnard speaking: So, although Google manages to build pseudo knowledge on the SERP in the form of its different curated Vertical Knowledge Graphs and also NLP analysis for Featured Snippets, outside the Knowledge Vault, everything else is peripheral, indirect, and temporary. Moving your entity from whatever vertical that is currently triggering the Knowledge Panel and into the Knowledge Vault is key. Not only for Knowledge Panels, but also Google Search and every other Google service you might want to leverage.
Jason Barnard speaking: As with everything worth doing, it is worth doing well, and it is worth bearing in mind that it will take time. Let’s look at the time factor. Why is Google’s Knowledge Vault relatively slow compared to results in the Web Index? Google is trying to build one centralised source of fundamental knowledge. They have multiple human curated Knowledge Verticals, and it would seem simple to move those into the Knowledge Vault.
Jason Barnard speaking: But there are huge problems. If the algorithm moves, say, an author from Google Books Vertical into the main Knowledge Graph, the Knowledge Vault, the explicit connection between the author and their books will be lost. Google Books is human curated. The Knowledge Vault is 100% machine learning, so the transfer from human logic to machine logic is not linear and simple.
Jason Barnard speaking: Whatever the vertical, the Knowledge Vault is adding entities all the time. The trick for Google is not to duplicate. In the case of Google Books, once the algorithm for the Knowledge Vault has confidently understood an author, it will use the lookup table and appropriate that author for itself, taking over from Google Books as the foundational source. The problem there is pretty obvious. The books are left behind.
Jason Barnard speaking: And even if the books are in the Knowledge Vault, the connection is not there. Towards the end of 2021, Kalicube data of over 2,000 authors showed that many had been moved into the Knowledge Vault, but their books were not. That meant that in the Knowledge Panel, the author still appeared, but the books were no longer included in the Knowledge Panel.
Jason Barnard speaking: This is a specific example, but it shows how this is going to work over the coming years. As Google’s global knowledge algorithm becomes confident about its understanding about a named entity, it will move individual entities from the different verticals into the Knowledge Vault, but it will not necessarily move the connected entities with it. And we have seen with Kalicube that even if it does move them at the same time, it is almost always loses the connection.
Jason Barnard speaking: In the case of Google Books, Google generally moves the author alone. And in that case, the connection to the books is lost in both Google’s understanding but also in the Knowledge Panel. Even when Google moves both the books and the authors from Google Books Vertical into the main Knowledge Graph Vertical at the same time, the connection between them is usually lost, because that Google Books Vertical was hard coded with human curated relationships, whereas the Knowledge Vault has no hard coding at all. It is being purely built with machine learning, so no human has direct control, which is both scary and reassuring.
Jason Barnard speaking: Now, let’s look at a problem. If you remember from the chest of drawers analogy, each Knowledge Graph Vertical independently adds new entities to that lookup table as it creates them. Since the verticals, including the Knowledge Vault, only communicate through this lookup table, you can see there are immense opportunities for duplicates of named entities. This happens a lot.
Jason Barnard speaking: As Google moves entities from the verticals into the Knowledge Vault, it is creating large numbers of duplicates. And this is a huge problem for Google, which explains why it appears to be moving very slowly.
Jason Barnard speaking: It is also a huge problem for us, since we need to keep track of those duplicates and ensure that we move our entities smoothly from the vertical they are in, be it Books, Podcasts, Scholars, Images, or the Web Index, into the Knowledge Vault and ideally move all of the connected entities with it.
Jason Barnard speaking: My example of Google Books is great for that. When Google moves an author from Google Books Vertical into the Knowledge Vault, it will generally leave the books behind. We can actively work to ensure that it moves the books and the author at the same time and retains the relationship. As always, that requires Kalicube’s 3 step process: Entity Home, description, consistent second and third party corroboration.
Jason Barnard speaking: In this case, for the author and every single one of his or her books, at the same time. So, you need an Entity Home for the author and every single one of the books, ready and waiting for that move, to give yourself the best chance of moving them all at the same time and retaining the connection, retaining their presence in the author’s Knowledge Panel.
Jason Barnard speaking: Remember Google Books, Podcasts, and so on have relationships hard coded into them. When you move into the Knowledge Vault, nothing is hard coded. It is up to you to create those relationships in the Knowledge Vault, and the work to create those relationships is always indirect. That said, once you have managed it, the relationship is arguably more solid than the human curated vertical and definitely more effective in terms of brand visibility in Google.
Jason Barnard speaking: It is really important to remember that Google will not add information to the Knowledge Vault if it isn’t super confident. And once it has added an entity and related information to the Knowledge Vault, it has a measurement of that confidence that it has in that information. This is hugely understandable. As the machine learns to learn, it will learn faster and faster. Google needs to minimise the errors at the base today, since those will be vastly amplified as the algorithms grow.
Jason Barnard speaking: From a practical perspective, it is absolutely essential that we get our entities into the Knowledge Vault. Triggering a Knowledge Panel with Google Books or Google Podcasts or even the Web Index Vertical is absolutely fine. It looks good, but it is superficial. So, you should never stop there.
Jason Barnard speaking: You need to move the entity into the Knowledge Vault. And then you need to build Google’s confidence in its understanding of that entity and every detail about that entity, so that your entity can get a solid, reliable, and permanent Knowledge Panel, but also gain prominence in every single Google algorithm and every Google service from Search to Discover to Gmail to Maps.
Jason Barnard speaking: Now, Google has two ways of using entities within their search algorithms. The first, most obvious is the Knowledge Vault, the multiple Knowledge Graph Verticals, and the kgmid lookup table we have discussed.
Jason Barnard speaking: But now, a huge aspect of what Google chooses to add to its Knowledge Vault and also what it chooses to show in Knowledge Panels is Google’s understanding through the Web Index. Here we are looking at how Google understands the pages it crawls. Your owned pages being the most important.
Jason Barnard speaking: HTML tables, lists, Schema Markup, and in my experience, most importantly, natural language processing of your copywriting, solid HTML structure, clear copywriting, and corroboration are the simple steps you need to get to Google to understand your entity, move it into the Knowledge Vault, and show a meaningful, truthful, and manageable Knowledge Panel built of micro Featured Snippets, as I explained in a previous lesson.
Jason Barnard speaking: The last topic of this lesson is why is this all so slow? With the Web Index, we have become accustomed to daily or weekly updates. Think back to the Google Dance.
Jason Barnard speaking: Google deals with data rivers and data lakes. The Web Index and the SERP results currently run on a process of data rivers, whereby Googlebot crawls the web page,
the data flows by the ranking algorithm, and the ranking algorithm can recalibrate itself more or less in real time using the stream of data provided by Googlebot.
Jason Barnard speaking: In the days of the Google Dance, Google worked on a system of data lakes.
Jason Barnard speaking: Googlebot would crawl the web, store the data in a lake, and then the SERP algorithm would only look into the data lake and update every few months. As such, search results would be very static for several months and then massively change when the SERP algorithm updated with the data lake. The Knowledge Vault is currently in the data lake scenario, so you need to be patient.
Jason Barnard speaking: That said, the explanation using the chest of drawers analogy shows that in terms of what Google is displaying in Knowledge Panels, we are closer to data rivers than we are to data lakes. However, the foundational understanding that Google uses to trigger a Knowledge Panel, especially from the Knowledge Vault, remains part of the data lake approach. Bear that in mind when working to trigger a Knowledge Panel, and bear in mind the data river approach when trying to manage the contents of a Knowledge Panel over time.
Jason Barnard speaking: This is what we do all day, every day at Kalicube using the Kalicube Pro SaaS platform. Thank you for watching, and I’ll see you soon, I hope.