Remove Duplicate Rows of Data in Excel

Image of two people reviewing data in a spreadsheet

People Images/Getty Images

When working with large amounts of data, removing or cleaning up duplicate records can be a lot of work. Excel simplifies this task. Below are two possible scenarios and solutions to streamline your spreadsheets.

Note: The information in this article applies to Excel 2019, Excel 2016, Excel 2013, Excel 2010, and Excel 2007.

01
of 02

Remove Duplicate Data Records in Excel

Search for Identical Records by Field Name with Remove Duplicates in Excel

Spreadsheet programs such as Excel are often used as databases for things like parts inventories, sales records, and mailing lists.

Databases in Excel are comprised of tables of data that are organized into rows of data called records. In a record, the data in each cell or field in the row is related, such as a company's name, address, and phone number.

A common problem that occurs as a database grows in size is the possibility of duplicate records or rows of data. This duplication occurs when:

  • Entire records are entered into the database more than once. This results in two or more identical records
  • Multiple records have one or more fields, such as a name and address, that contain the same data.

Duplicate records cause problems. One example problem is that a duplicate record can send multiple copies of documents to the same person when the database information is used in a mail merge. To prevent problems, scan for and remove duplicate records on a regular basis.

It's easy to pick out duplicate records in a small sample like the one in the image above. But when data tables contain hundreds or thousands of records, it gets difficult to pick out duplicate records, especially partially matching records.

To make it easier to accomplish this task, Excel has a built in data tool called Remove Duplicates. Remove Duplicates finds and removes identical and partially matching records.

When you use Remove Duplicates, identical and partially matching records must be dealt with separately. This is because the Remove Duplicates dialog box displays the field names for the selected data table and you choose which fields to include in the search for matching records:

  • For identical records, search for all fields. Leave check marks next to all column or field names.
  • For partially matching records, leave check marks next to the fields to be matched.

Field Names vs. Column Letters

The Remove Duplicates tool consist of a dialog box where you choose which matching fields to search for by selecting the desired field or column names.

The information that the dialog box displays, either field names or column letters, depends on whether your data contains a row of headings, or headers, at the top of the data table as seen in the image above.

If your data has headers, place a check mark next to My data has headers. This causes Excel to display the names in this row as field names in the dialog box.

If your data does not have a header row, the dialog box displays the appropriate column letters for the selected range of data.

Contiguous Range of Data

For the Remove Duplicates tool to work properly, the data table must be a contiguous range of data. The data table must not contain any empty rows, columns, and, if at all possible, no empty cells.

Not having blanks within a data table is a good practice when it comes to data management and not just when searching for duplicate data. Excel's other data tools, such as sorting and filtering, work best when the data table is a contiguous range of data.

Remove Duplicate Data Records Example

In the image above, the data table contains two identical records for A. Thompson and two partially matching records for R. Holt. In the example, all fields match except the student number.

The steps listed below detail how to use the Remove Duplicates data tool to:

  • Remove the second of two identical records for A. Thompson.
  • Remove the second partially matching record for R. Holt.

Open the Remove Duplicates Dialog Box

  1. Select a cell containing data in the sample database.
  2. Select the Data tab.
  3. Select Remove Duplicates to highlight all data in the data table and to open the Remove Duplicates dialog box.

Here's what you'll find in the Remove Duplicates dialog box:

  • The Remove Duplicates dialog box displays all the column headings or field names from the data sample.
  • The check marks next to the field names indicate which columns Excel will match when searching for duplicate records
  • When the dialog box opens, all the field names are selected.

Find Identical Records

This tutorial example searches for identical records. Select all the column headings and select OK.

Here's the result:

  • The dialog box closes and is replaced by the message: 1 duplicate values found and removed; 7 unique values remain.
  • The row containing the duplicate A. Thompson record is removed from the database
  • There are two partially matching records for R. Holt because not all fields matched. The student number for the two records differ. Excel considers each record to be a unique data record.
02
of 02

Find and Remove Partially Matching Records with Remove Duplicates

Excel Remove Duplicates - Searching for Partially Matching Records by Field Name

Check One Field at a Time

In the previous example, Excel removed data records that exactly match the selected fields of data. To find partially matching data records, remove the check mark for only one field at a time, as shown in the steps below.

Subsequent searches for records that match in all fields except for name, age, or program remove all possible combinations for partially matching records.

Find Partially Matching Records

  1. Select a cell containing data in the data table, if necessary.
  2. Select Data.
  3. Select Remove Duplicates to highlight all data in the data table and to open the Remove Duplicates dialog box.
  4. All field names or column headings for the data table are selected.
  5. To find and remove records that do not have a match in every field, remove the check mark next to those field names that you want Excel to ignore.
  6. For this example, remove the check mark next to the Student ID column heading.
  7. Excel searches and removes records that have matching data in the Last Name, Initial, and Program fields.
  8. Select OK.
  9. The dialog box closes and is replaced by the message: 1 duplicate values found and removed; 6 unique values remain.
  10. The row containing the second record for R. Holt with the Student ID of ST348-252 is removed from the database.
  11. Select OK to close the message box.

The example data table is now free of all duplicate data.