HIGHLIGHT DUPLICATES IN EXCEL

CA GIRISH (Chartered Accountant) (1796 Points)

05 May 2010  

Finding duplicate values in the same column is easy; you can sort or apply a filter depending on the circumstances. Finding duplicates that span multiple columns is a tad more difficult. A sort can work, but then you have to find the duplicate values. So while it’s better than no solution at all, it’s not a good solution. You probably want a solution that kind of screams out at you Here I am! I’m a duplicate!

Let’s take a quick look at a simple example. The sheet below contains a column of dates and a column of initials. A few dates are repeated and a few initial sets are repeated; they represent duplicates within those columns. However, we’re interested in records that repeat the same date and the same initials. That’s what I mean by a multi-column duplicate. It’s fairly easy to spot the duplicates, rows 8 and 9, in such a simple sheet, but what if you had hundreds or thousands of rows to check?

An AutoFilter will work, but it’s a vulnerable solution. In this case, there are five distinct dates. That means a user has to review at least five sets of records to find duplicates. Even then, you have to trust your user to actually spot them. It’s not a good solution, especially if you have lots of data.

You might try an advanced filter or even conditional formatting, but both would require so much hoop-jumping that I’m not sure the end result would be worth the fuss, when compared to the easy-to-implement solution that follows:

  • Concatenate the columns you’re checking.

  • Use CountIf() to count the number of combined values.

Let’s enhance the sheet above to see how this works:

  1. In cell C2 enter the formula =A2&B2. (You can combine more columns.)

  2. Copy the formula to C3:C9. Excel uses each date’s serial value, but that won’t interfere with the technique. However,  if your values contain times, it might, depending on how the time values were entered.

  1. In cell D2 enter the following formula:

    =IF(COUNTIF($C$2:C2,C2)>1, “Here I am! I’m a duplicate!”,”Original”)

  2. Copy the formula to D3:D9. At this point, finding multi-column duplicates is as easy as sorting by column D (although this example doesn’t require any sorting).

The IfCount() function counts the number of times the concatenated values occur within the extending range. If the count is greater than 1, the formula returns the string “Here I am! I’m a duplicate!”; when the count isn’t greater than 1, the formula returns the string “Original.” Only the first occurrence will be identified as Original.

Source: Website