Linux uniq Command: What It Is and How to Use It

Make short work of hunting through textual data

Linux (and its predecessor, Unix) was built on plain text. As a result, it has all sorts of useful text processing tools you can use from the terminal. The Linux uniq utility was designed to help you sort through text files for unique values. 

What Is Linux uniq and When Would You Use It?

The uniq command comes installed on most Linux distributions out of the box, and belongs to the coreutils package. It's used to identify and 'collapse' lines of adjacent, identical text. Let's unpack this definition a bit. 

  • The basic unit for comparison is a line of text, i.e. all the text from one line return until the next one. This can include multiple sentences, provided they're in the same paragraph. 
  • By default, uniq only compares adjacent lines only. This means if two lines are exactly the same, but there's a different one in between them, they'll be considered different unless you apply some different options to the command (more on this later). 
  • In this context, "collapsing" means then uniq displays its output, it will only include the first occurrence of the line. 

The uniq command helps you sift through lots of data, and identify which lines are the same, and remove them from the output.

Basic Usage of Linux uniq Command

Removing Duplicate Lines with uniq

At a basic level, using the Linux uniq command is as follows:

uniq -o=value /path/to/inputfile 

Here, the "o" above represents the shorthand flag for one of its options. You can also enter this in its longer form, such as: 

uniq --option=value /path/to/inputfile 

The "inputfile" must be a plain text file containing your data. There are many options for the uniqu command in Linux, but it may not be obvious how you can use these options to provide you with useful output. We'll take a deep-dive into some of them in the below sections.

Removing Adjacent Duplicates With the uniq Command

In its most basic form, the uniq command will 'collapse' adjacent duplicates and display the results. For example, let's say you're starting a new blog and have a list of people who signed up for your email newsletter (newsletter.txt), but are not yet members.

Jsmith@example.com 
Jsmith@example.com 
Tmiller@example.com 
Mjones@example.com 
Mjones@example.com 

Since you wouldn't want to bother these people more than once, you can de-duplicate this with the following: 

$ uniq newsletter.txt 
Jsmith@example.com 
Tmiller@example.com 
Mjones@example.com 

Admittedly, this on its own isn't very exciting. If a third occurrence of "Jsmith@example.com" existed at the end of the file, it would remain. So it's important to learn some of the options for this command.

Counting the Number of Occurrences With uniq

Counting Repeated Lines with uniq

Let's suppose your blog takes off and not only are people registering, they're subscribing! For money! And why wouldn't they? The list of payments you're receiving will start to grow.

Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Peters Aaron Apeters@example.com $10.00
Peters Aaron Apeters@example.com $10.00
Peters Aaron Apeters@example.com $10.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Jones Mary Mjones@example.com $5.00
Jones Mary Mjones@example.com $5.00
Jones Mary Mjones@example.com $5.00
Jones Mary Mjones@example.com $5.00
Jones Fred Fjones@example.com $4.00
Jones Fred Fjones@example.com $4.00
Jones Fred Fjones@example.com $4.00
Jones Fred Fjones@example.com $4.00
Jones Fred Fjones@example.com $4.00

At some point, you'll want to take stock of how long some of your subscribers have been with you. Given the above list of their payments to date, you can have uniq count the number of occurrences with the -c flag:

$ uniq -c payments.txt
     8 Smith John Jsmith@example.com $3.00
     3 Peters Aaron Apeters@example.com $10.00
     6 Miller Tim Tmiller@example.com $1.00
     4 Jones Mary Mjones@example.com $5.00
     5 Jones Fred Fjones@example.com $4.00

However, this again relies on the lines being adjacent... if there were any that weren't, there would be duplicates in the output of the program that's designed to de-duplicate! For this reason, uniq is most useful when used in conjunction with the sort command.

Displaying Unique Lines with sort and uniq Commands

The sort command helps us here as it will arrange duplicated lines so they are adjacent, thereby allowing uniq to filter them out. For example, imagine the above payment report didn't come nicely ordered:

Smith John Jsmith@example.com $3.00 
Jones Fred Fjones@example.com $4.00
Miller Tim Tmiller@example.com $1.00
Peters Aaron Apeters@example.com $10.00
Jones Mary Mjones@example.com $5.00
Peters Aaron Apeters@example.com $10.00
Miller Tim Tmiller@example.com $1.00
Jones Fred Fjones@example.com $4.00
Smith John Jsmith@example.com $3.00
Jones Fred Fjones@example.com $4.00
Peters Aaron Apeters@example.com $10.00
Jones Fred Fjones@example.com $4.00
Jones Fred Fjones@example.com $4.00
Miller Tim Tmiller@example.com $1.00
Jones Mary Mjones@example.com $5.00
Smith John Jsmith@example.com $3.00
Miller Tim Tmiller@example.com $1.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Smith John Jsmith@example.com $3.00
Jones Mary Mjones@example.com $5.00
Jones Mary Mjones@example.com $5.00
Miller Tim Tmiller@example.com $1.00
Miller Tim Tmiller@example.com $1.00
Smith John Jsmith@example.com $3.00

In this case, you'd want to first run this list through sort to group all the like items together, then run uniq. This uses the pipe operator on the command line ("|"), where the results of the command before the pipe get fed directly into the second command. So when we run this on our mixed-up payments we get the unique results (with their count):

$ sort payments-rand.txt | uniq -c
5 Jones Fred Fjones@example.com $4.00
     4 Jones Mary Mjones@example.com $5.00
     6 Miller Tim Tmiller@example.com $1.00
     3 Peters Aaron Apeters@example.com $10.00
     8 Smith John Jsmith@example.com $3.00

Use the uniq Command for Quick Data Analysis

As you get more familiar with the Linux command line, you'll find tons of useful programs like uniq. Sure, you could open the above in Excel and sort that way, but then you wouldn't start earning any tech cred, now would you?