How to Use the Ngram Viewer Tool in Google Books

Learn how to research using this Google Books Ngram Viewer tutorial

An Ngram of "vinegar pie."

Marziah Karch

An Ngram, also called an N-gram, is a statistical analysis of text or speech content to find n (a number) of some sort of item in the text.

The search item can be all sorts of things, including phonemes, prefixes, phrases, and letters. Although an Ngram is obscure outside the research community, it is used in a variety of fields and has a lot of implications for developers who are coding computer programs that understand and respond to natural spoken language.

In the case of the Google Books Ngram Viewer, the text to be analyzed comes from the vast number of books in the public domain that Google scanned to populate its Google Books search engine. For Google Books Ngram Viewer, Google refers to the body of text you are going to search as the corpus. The Ngram Viewer aggregates by language, although you can separately analyze British and American English or lump them together.

How the Ngram Viewer Works

  1. Go to Google Books Ngram Viewer at books.google.com/ngrams.

  2. Type any phrase or phrases you want to analyze. Separate each phrase with a comma. Google suggests, "Albert Einstein,Sherlock Holmes,Frankenstein" to get you started.

    In NGram Viewer searches, items are case-sensitive, unlike in Google web searches.

  3. Select a date range. The default is 1800 to 2000.

  4. Choose a corpus. You can search foreign language texts or English texts, and in addition to the standard choices, you may notice entries such as "English (2009)" or "American English (2009)" at the bottom of the list. These are older corpora that Google has since updated, but you may have some reason to make your comparisons against old data sets. Most users can ignore them and focus on the most recent corpora.

  5. Set the smoothing level. Smoothing refers to how smooth the graph is at the end. The most accurate representation reflects a smoothing level of 0, but that setting may be difficult to read. The default is set to 3. In most cases, you don't need to adjust it.

  6. Press Search lots of books.

Using Google's Ngram Viewer, you can drill down into the data. If you'd like to search for the verb fish instead of the noun fish, you can do so by using tags. In this case, you'd search for fish_VERB.

Google provides a complete list of commands other advanced documentation for use with Ngram Viewer on its website. 

What Is Ngram Showing?

Google Books Ngram Viewer outputs a graph that represents the use of a particular phrase in books through time. If you entered more than one word or phrase, each one is represented by a color-coded line to contrast with the other search terms. This is similar to Google Trends, only the search covers a longer period.

Case Study

Consider the case study of vinegar pies. They're mentioned in Laura Ingalls Wilder's Little House on the Prairie series. Exploring with Google's web search to learn more about vinegar pies reveals that they're considered part of American Southern cuisine and are indeed made with vinegar. They hearken back to times when not everyone had access to fresh produce at all times of the year but is that the whole story?

Search Google Ngram Viewer for vinegar pie, and you'll encounter some mentions of the pie in both the early and late 1800s, a lot of mentions in the 1940s, and an increasing number of mentions in recent times. However, with a smoothing level of 3, you see a plateau over the mentions in the 1800s. Because there weren't a lot of books published during that time and because the data is set to smooth, the picture is distorted. Probably only one book mentioned vinegar pie, and it was averaged to avoid a spike. By setting the smoothing to 0, you can see that this is precisely the case. The spike centers on 1869, and there's another spike in 1897 and 1900.

It's unlikely that nobody talked about vinegar pies the rest of the time: There were probably recipes floating all over the place, but people didn't write about them in books, and that's an important limitation of Ngram searches.