Spot complex duplicates based on multiple criteria

by Sean Kavanagh
Application:
Microsoft Excel 2000/2002/2003/2004/2007
Operating Systems:
Microsoft Windows, Macintosh
 
Identifying duplicates is one of the most difficult tasks you can perform in Excel — and also one of the most common. When you need to find duplicates based on several criteria, such as first and last name, the project can be frustrating. We’ll show you three techniques and let you decide which one best fits your needs.
 
To demonstrate three ways to find duplicates in your worksheet based on several criteria, we’ll:
       Use a COUNTIF array formula to flag duplicates.
       Concatenate criteria from multiple cells into a single value, forming a new data value that you can use to identify duplicates.
       Build another array formula, but take a different approach that allows you to base duplicates on noncontiguous cells.
 
There are various techniques for identifying duplicates, but you may run into situations that require a more sophisticated approach. If you need to base duplicates on multiple criteria — such as product code and color — you could use the Advanced Filter. However, this technique may require that you restructure the layout of your workbook, which isn’t always convenient or desirable. We’ll give you three alternative techniques.
 
Tag your potential duplicate entries
In all three techniques, we’ll tag duplicates so that you can quickly identify them, either visually or through a tool such as AutoFilter. Although these techniques will require some modification of your workbook — to the extent that you’ll store formula results in unused columns — they don’t require that you change the order of the worksheet’s columns, as may be the case with the Advanced Filter.

To demonstrate the techniques, we’ll work with the sample data shown in Figure A. We’ve sorted it by last name, which makes it easy to visually identify duplicates. This approach is fine for small sets of data, but visually reviewing hundreds or thousands of records would be a major chore. Instead, you can get a jumpstart on the cleanup process by letting Excel identify potential duplicates for you.

 

A:
You’ll need to use data from multiple columns to identify potential duplicates within this table.
 
What does “duplicate” mean to you?
The first step you need to take when looking for duplicates is defining what exactly constitutes a duplicate record. There’s no exact science to the process — often the decision comes down to a gut feeling of which combination of data will yield the best results for your specific needs. For example, if you base duplicates on all the data, you’ll miss obvious duplicates because of minor inconsistencies, such as an address entered with Dr. instead of Drive. If you base duplicates on too few fields, you may identify false duplicates.
  
Identify duplicates based on adjacent cells
For the sake of our first example, let’s say you’ve decided to look for duplicates based on a combination of first and last names. If the columns of data that you’re using as duplicate criteria are next to one another, as in this case, you can use a simple

 

Join Now Close