The Knowledge Panel Course: How Google’s Knowledge Graph Works
Script from the lesson The Knowledge Panel Course
Jason Barnard speaking: Hi there. In this lesson, I’ll explain how a Knowledge Graph works. Understanding this is hugely helpful when adding information to a Knowledge Graph from the outside, which is what we are doing here with Google.
Jason Barnard speaking: A Knowledge Graph is built up of nodes, attributes, edges, and paths. We’re going to start with nodes, attributes, and edges. Paths are super important one you have the initial node setup with attributes and some edges. I know this might sound like gobbledygook, but behind the words, their concepts are actually very simple.
Jason Barnard speaking: Nodes are entities, a person, a road, a company, a podcast, and so on and so forth. Attributes are properties, height, colour, age, and so on. And the edges are relationships between two entities, works for, parent of, founder of, and so on.
Jason Barnard speaking: Importantly, this is how the human brain functions, which makes the entire concept of Knowledge Graphs very easy to understand. As humans, we understand the world through entities and their relationships to other entities. “Jason Barnard works for Kalicube.” And as you might have noticed, simple grammar also functions this way. Entity-relationship-entity is a simple subject-verb-object. “Jason Barnard voiced Boowa.” Boowa is the cartoon blue dog I played in a TV series, by the way.
Jason Barnard speaking: And attributes are often adjectives. “The red bus drives in London.” Here, red is an attribute, bus is an entity, drives is a relationship, and the relationship is with the entity London.
Jason Barnard speaking: Everything in threes. When teaching children, three is often referred to as the magic number. And it is magic. Our brains work in threes and so do Knowledge Graphs. Here is a simple Knowledge Graph Google uses to illustrate theirs.
Jason Barnard speaking: In terms of how Google is growing its Knowledge Graph, we can use an analogy of a tree. Google built the trunk of the tree using human curated sources, Freebase, IMDb, MusicBrainz, Wikidata, and Wikipedia. The Knowledge Graph is built with this solid, reliable information at its core. Google then started adding some thick, solid branches using its own knowledge bases such as Google Podcasts, Google Books, Google Scholars, and so on. It’s a little more complex than that, but I’ll go into a little more detail into Google’s own knowledge sources in the lesson about the different Knowledge Graph verticals that trigger Knowledge Panels.
Jason Barnard speaking: Google also included some information from D&B, Crunchbase, and several other third party sources. Quite what their criteria or methods were is unclear. But the crux here is that so far, this is all human curated, i.e. entities, attributes, and relationships that have been checked by humans. Obviously, not 100% accurate, but sufficiently reliable for Google to use this to show its knowledge algorithms what facts looks like. So, they can then train the algorithms to learn to figure out the facts on their own.
Jason Barnard speaking: At Kalicube, we have data going back to 2017. And that data suggests that initially, if the information wasn’t corroborated by one of the core sources mentioned above, it wouldn’t be added to the Knowledge Graph.
Jason Barnard speaking: So, the next step was to add some additional branches. From around 2020, Google has given the algorithms the freedom to establish facts and add to the Knowledge Graph using information from Google’s web index. Initially, this seems to have been limited to a seed set of relatively reliable sources, both the generally recognised ones such as Amazon, LinkedIn, Crunchbase, Rotten Tomatoes, Spotify, and so on, but also some hyper niche websites that had established themselves as authorities in that niche.
Jason Barnard speaking: At Kalicube, we had significant successes independently of the obvious sources, including using my own website, jasonbarnard.com, for the Boowa & Kwala characters and my music from the 90s, The Barking Dogs.
Jason Barnard speaking: Today, Google is allowing the knowledge algorithms to add twigs to the solid branches using data from the web index to establish facts without human intervention. It is learning on its own, and that is huge.
Jason Barnard speaking: This visual shows how much information in the Knowledge Vault comes from where. As of September 2022, 71% is the human curated trunk of the tree, 15% is Google’s human curated solid branches, 12% is machine-defined facts from a seed set, and 2% is twigs from the open web. Now, 2% seems very small, but that is 2% of 150 billion facts. And once again, at Kalicube, we have had considerable success in adding information to this 2% for ourselves and our clients using the strategy I teach in this course.
Jason Barnard speaking: 2% twigs from the open web sounds a little disappointing and not very motivating, but think about what will happen. You add a twig today, and that twig will grow, think confidence. And it will become a branch, and then Google will attach new twigs to that, and then new twigs to those, and so on and so on and so on. The sooner you add your twig to this Knowledge Vault tree, the closer you are to the main branches, the more important your branch becomes to the structure of the tree, the safer your place in the Knowledge Vault, and the more stable, reliable, and solid your Knowledge Panel will be.
Jason Barnard speaking: Now, in a Knowledge Graph, you cannot add a new entity unless it is connected to an existing entity. So, one of the tricks to adding entities to Google’s Knowledge Vault is to add your entity by indicating a relationship to one or more entities that are already in the Knowledge Vault.
Jason Barnard speaking: You are looking for relationships that are close, strong, and long. The best entities to link to are those that have the closest, strongest, and most permanent relationship. Ideally, you’ll find an entity that is as close to the trunk as possible too, but don’t try to connect your entity to a core entity unless there is a real, meaningful relationship that is one of the more close, strong, and long-lasting.
Jason Barnard speaking: For example, I wouldn’t emphasise my relationship with Sting. We both play the bass. And that is a permanent relationship, but it’s very distant and incredibly weak. I would emphasise my relationship with Boowa & Kwala, since my wife and I created them and voiced the characters in the TV series. That is very close, very strong, and permanent. Or my mother, she is a famous jazz musician. She is close to the trunk of our knowledge tree, and that is a very close, strong, and permanent relationship I have with her.
Jason Barnard speaking: More pragmatically for a company, a close relationship for a company would be an employee versus a subcontractor. A sub organisation would be a stronger relationship than a commercial partner. The company founder would be a longer relationship than the CEO.
Jason Barnard speaking: Now, the next question is what factors allow Google to reliably identify a named entity? It recognises a named entity by a combination of its name, i.e. the string of characters, the relationship it sees to other named entities, and the attributes of the entity.
Jason Barnard speaking: So, it recognises Jason Barnard firstly by the string of characters, J A S O N B A R N A R D, but there are more than 300 Jason Barnards in the world and at least 4 in the Knowledge Vault. It recognises that I am this specific Jason Barnard through my attributes, my date of birth, my height and thirdly, it looks at my relationships to other entities, my mother, my sister, my alma mater, Liverpool John Moores University, and so on.
Jason Barnard speaking: Now, how does it become sufficiently confident in that understanding of the named entity to add it to the Knowledge Vault? It looks at first, second, and relevant third party sources and looks for corroboration. In short, we are looking to draw Google’s attention to permanent attributes such as a date of birth, a founding date, a founder, a launch date, and also indicate solid and meaningful relationships to entities that are already in the Knowledge Vault.
Jason Barnard speaking: Importantly, don’t obsess about the trunk. Wikipedia and Wikidata are important, but they are not necessary. The tree is now solid enough that Google can add twigs and branches that do not connect to the trunk.
Jason Barnard speaking: The last part of this lesson is paths. You can look at these as indirect relationships. If you have heard of the game Six Degrees of Kevin Bacon, you’ll get the idea right away with no further explanation. The game is centred around the six degrees of separation theory, that any two people are only ever six or less connections apart. The game involves linking anyone in Hollywood to prolific actor Kevin Bacon via their roles in six film titles or less.
Jason Barnard speaking: In the case of a Knowledge Graph, it simply means that everything is ultimately connected to everything else in six steps or less. Using the Knowledge Graph, Google is mapping the path from one entity to another through tertiary entities and showing them on the SERPs in the form of carousels and other Rich Elements.
Jason Barnard speaking: For example, it understands Jason Barnard. It understands that I am an alumni of Liverpool John Moores University. It understands that John Lennon also attended that university. And that means I am just one hop away from John Lennon, and I get a place next to him in a carousel in a Google SERP.
Jason Barnard speaking: We will see Google displaying items using this type of indirect path relationship in the coming years. This is going to be an amazing opportunity to get more visibility on Google properties, not only Google Search but also Discover, Images, Books, Podcasts, YouTube, and any other services Google releases.
Jason Barnard speaking: I play a game that I call Knowledge Panel hopping. It is a little like the Six Degrees of Kevin Bacon, but I am playing the game to understand the paths Google has in its Knowledge Graph and to inspire ideas about what information, attributes and relationships, I can best leverage when educating Google about an entity.
Jason Barnard speaking: Once you have got all the basics in place, Knowledge Panel hopping will enable you to better understand where you need to go next. Also, as you can see here, a quick hop back and forth uncovers information Google has in the Knowledge Graph that it didn’t initially show. And Knowledge Panel hopping is something we can expect to become natural and normal behaviour by Google’s users as it adds more Knowledge Panels and more factual information in and around those panels.
Jason Barnard speaking: Researching a film or a film star purely on Google’s SERPs is something people already do. In a few years, thoroughly researching a person, a product, or a company using just the factual information provided by Google is likely to become commonplace. And at that point, our audience using Google will be looking at us through Google’s eyes, and Knowledge Panel hopping will not be a game any more. And path analysis will be a huge part of Knowledge Panel management.