Google Knowledge Graph and Knowledge Based Trust

Ranking Articles Based on Fact

Learning from the scholar
Getty credit: Exdez

There's a lot of misinformation on Google. Websites with quack cures and conspiracy theories can end up outranking websites with legitimate information. Naive users may think that the most relevant results are the most accurate results. Google offers Knowledge Graph boxes in search results to help users find facts quickly, but they may also be working on manipulating search results to take it a step further.


Google may be working on a way to downgrade results with incorrect information and boost results with more facts, as suggested by an article in New Scientist

The Academic Roots of Google Search Rank

Google uses a variety of methods to rank websites these days. The backbone of Google's original search algorithm borrowed from academia to determine the credibility and relevance of websites in search rank. PageRank uses a variation of citation indexing. Academics count the number of times a particular researcher has been cited in other academic works and use that as a proxy for determining how influential a particular researcher is. If you're curious, check Google Scholar. It lists citation index for scholars. (The h-5 index).

PageRank applies that same logic to web pages. Pages with more inbound links are considered to be more influential than pages with no links. 

There are some obvious problems with both systems.

You can cite or link yourself. You could be citing a journal article or linking to a website because you want to criticize it. (When linking to websites to criticize them, use the noindex meta tag to avoid boosting their rank.) In the case of search rank, there are also spambots and fake websites and other dirty tricks to game the system that Google plays whack-a-mole to try to avoid.


Knowledge Based Trust

This interesting  paper from Google suggests that they are working on a way of  leveraging known true information to rank pages for trustworthiness:

We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy...

...We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources.  

It seems to imply that this Knowledge-Based Trust would be used as an additional signal for determining the accuracy of websites. Not the only signal, so everyone writing fan-fiction can relax. Your pages won't totally disappear without facts. However, putting misinformation might lower the rank. 

The Knowledge Graph Database

It's certainly a fantastic step forward to have accurate information about vaccines or climate change ranking higher than misinformation, but there may be a few flaws with the idea. Where is Google getting the facts for comparison? Mostly Wikipedia.

Knowledge Graph is Google's semantic search engine, and the results appear in a rich text box to the right of regular search results.

Search for "banana," for example, and you'll see a Knowledge Graph with a picture of bananas and quick facts about the fruit. The source for the "banana" result is the USDA, so clearly Wikipedia isn't the only data source for Knowledge Graph. Previously Knowledge Graph also used structured data from a Google-owned project called "Freebase." Freebase was folded into a Wikipedia data project and retired. Essentially, Wikipedia powers a good chunk of their knowledge base with structured data formatted in a way Google can easily read. Google is not the only company doing this. Siri, Yahoo, and Bing use Wikipedia to answer common questions, too.


Using Wikipedia is free and simple, and usually correct. Usually. There's the problem. It's the encyclopedia that "anyone can edit." School teachers and professors frown on papers that use Wikipedia as a source. Wikipedia won't even accept Wikipedia as a reliable source. 

Google currently uses Wikipedia both as a source of facts and as an indicator of relevance. Search for someone or something with a Wikipedia, and you'll get a Knowledge Graph box in the answer. Search for someone or something without one, and you likely won't. There's a very tiny "feedback" link you can use to report errors in the information - if you happen to notice both the tiny gray link and the error.  

Hoax edits in Wikipedia have gone undiscovered for decades. The same hoax edits have been cited as fact by other publications, which could then, in turn, be used as a "reliable source" for a reference in a Wikipedia article. (Neat!) People can create pages about themselves or maliciously edit pages about people or things they dislike, so long as they adequately mimic other Wikipedia articles and standards. There are even PR firms that specialize in creating paid promotional pages for their clients. This has lead to a complex system of rules and anti-spam measures that tend to discourage new editors from joining Wikipedia or making more than a handful of edits. Wikipedia editors are often spending more time and energy debating each other and trying to catch self-promoters than they are spending on fixing the grammatical errors and inconsistencies in Wikipedia articles.  

As a consequence, Wikipedia suffers from groupthink and systemic bias. The pages are maintained by a shrinking number of core editors, and those editors are overwhelmingly male. Female-centric topics are often underreported or deemed "un-notable."

That's a long-winded way to say that Google needs some better alternatives to Wikipedia for building knowledge-based rankings into their metrics and determining what triggers the Knowledge Graph. This is where you can help. 

Make Your Own Rich Snippets

If you are a web designer, you can take matters into your own hands and make your own structured data. That gives Google more data sources, but it also visually boosts results for your site. Don't abuse this with garbage data because Google does spot checks.