How to Use the 'Ngram Viewer' Tool in Google Books

An Ngram of "vinegar pie."

Marziah Karch

A Ngram, also commonly called an N-gram, is a statistical analysis of text or speech content to find n (a number) of some sort of item in the text.

The search item could be all sorts of things, like phonemes, prefixes, phrases, or letters. Although the N-gram is somewhat obscure outside of the research community, it is used in a variety of fields and it has a lot of implications for developers coding computer programs that understand and respond to natural spoken language.

In the case of Google Books Ngram Viewer, the text to be analyzed comes from the vast amount of books Google has scanned in from public libraries to populate their Google Books search engine. For Google Books Ngram Viewer, they refer to the text you are going to search as the corpus. The Ngram Viewer aggregates by language, although you can separately analyze British and American English or lump them together.

How Ngram Works

  1. Go to Google Books Ngram Viewer at books.google.com/ngrams.

  2. Type any phrase or phrases you wish to analyze. Separate each phrase with a comma. Google suggests, "Albert Einstein, Sherlock Holmes, Frankenstein" to get you started. Items are case-sensitive, unlike Google web searches.

  3. Type a date range. The default is 1800 to 2000.

  4. Choose a corpus. You can search foreign language texts or English, and in addition to the standard choices, you may notice things like "English (2009) or American English (2009)" at the bottom. These are older corpora that Google has since updated, but you may have some reason to make your comparisons against old data sets. Most users can ignore them and focus on the most recent corpora.

  5. Set your smoothing level. Smoothing refers to how smooth the graph is at the end. The most accurate representation would be a smoothing level of 0, but that setting may be difficult to read. The default is set to 3. In most cases, you don't need to adjust it.

  6. Press the Search lots of books button.

Google allows you to drill down quite a bit with the Ngram Viewer. If you'd like to search for fish the verb instead of fish the noun, you can do so by using tags. In this case, you'd search for "fish_VERB"

Google provides a complete list of commands you can use and other advanced documentation on their website. 

What Is Ngram Showing?

Google Books Ngram Viewer will output a graph that represents the use of a particular phrase in books through time. If you have entered more than one word or phrase, you will see color-coded lines to contrast the different search terms. This is pretty similar to Google Trends, only the search covers a longer period of time.

Case Study

Consider the case study of vinegar pies. They're mentioned in Laura Ingalls Wilder's Little House on the Prairie series. Exploring with Google's Web search to learn more about vinegar pies reveals that they're considered part of American Southern cuisine and really are made from vinegar. They hearken back to times when not everyone had access to fresh produce at all times of the year. But is that the whole story?

Search Google Ngram Viewer for vinegar pie and you'll encounter some mentions of the pie in both the early and late 1800s, a lot of mentions in the 1940s, and an increasing number of mentions in recent times. However, with a smoothing level of 3 you'll see a plateau over the mentions in the 1800s. Because there aren't a lot of books published during that time, and because our data is set to smooth, it distorts the picture. Probably there was one book that mentioned vinegar pie, and it just got averaged to avoid a spike. By setting the smoothing to 0, we can see that this is exactly the case. The spike centers on 1869, and there's another spike in 1897 and 1900.

It's unlikely that nobody talked about vinegar pies the rest of the time: There were likely recipes floating all over the place, but people just didn't write about them in books, and that's an important limitation of these Ngram searches.