In this article, we will learn the different methods to remove duplicates from a list in C#. List<T> is a collection having elements of the same type <T> which can contain multiple occurrences of the same item. Sometimes, we want that every element is unique, and here we will check various ways to achieve this goal, comparing their performance.
We already dealt with this problem when the collection is an array, and if you are interested in this topic you can check the Remove Duplicates From a C# Array article.
Let’s start.
Remove Duplicates by Using LINQ Methods
We begin by creating a list containing some duplicates:
List<int> listWithDuplicates = new List<int>() { 1, 2, 3, 4, 1, 2, 3, 4 };
We want to make the listWithDuplicates
contain every number only once { 1, 2, 3, 4 }
.
Using Distinct()
The first method we use is the LINQ Distinct()
:
listWithDuplicates.Distinct().ToList();
The Distinct()
method does exactly what we need – removes the duplicates. It returns an IEnumerable<int>
result and that’s why we need to perform ToList()
if we need a list.
Using GroupBy() and Select() Combo
Going forward with LINQ methods, we can also perform:
listWithDuplicates.GroupBy(x => x).Select(d => d.First()).ToList();
In this case, we first group equal elements in listWithDuplicates
using the GroupBy()
method. For each group of equal elements, we return the first one with Select()
. Since this operation returns an IEnumerable<int>
, we invoke ToList()
at the end.
Using Union()
The Union()
method creates an union between two sequences, using the default equality comparer. The union of two lists contains the elements from both, excluding duplicates:
listWithDuplicates.Union(listWithDuplicates).ToList();
Here, we perform the union between listWithDuplicates
and itself. This operation produces a IEnumerable<T>
without duplicates and we convert into a List<T>
with ToList()
.
Remove Duplicates From a List in C# by Using Other Collections
Up to now, we focused on the LINQ methods, but we can also recur to other collections to remove duplicates from a List<T>
.
HashSet
HashSet<T>
represents a unordered collection that contains no duplicate elements:
listWithDuplicates.ToHashSet().ToList();
In this case, we convert the list into a HashSet<T>
and this operation returns an IEnumerable<T>
without duplicates and then, we convert it into a List<T>
.
Another interesting use of HashSet<T>
is to initialize a Hashet from listWithDuplicates
and then converting back to a list:
new HashSet<T>(listWithDuplicates).ToList();
In this case, when we initialize the HashSet
, the compiler deletes the multiple values.
Dictionary
A Dictionary<TKey,TValue>
maps a set of keys to a set of values:
var dict = new Dictionary<T, int>(); foreach (var s in listWithDuplicates) { dict.TryAdd(s, 1); } var distinctList = dict.Keys.ToList();
In the foreach loop, we call TryAdd()
, that adds an item to the keys of the dictionary only if it is not already contained, since a key in a dictionary must be unique. When we exit from the loop, we take the keys and convert the collection to a list.
Empty List
We can also add the unique elements to another list:
var listWithoutDuplicates = new List<T>(); foreach (T item in listWithDuplicates) { if (!listWithoutDuplicates.Contains(item)) //we can also use !listWithoutDuplicates.Any(x => x.Equals(item)) { listWithoutDuplicates.Add(item); } }
We check if listWithoutDuplicates
already contains an item in the foreach loop, and if it doesn’t we add it. To verify it, we use Contains()
LINQ method, but we can also use Any()
and Equal()
comparer and we will see if this difference impact performance in the related section.
Iterating Over The List
We can remove duplicate elements using iterations:
var n = listWithDuplicates.Count; for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { if (listWithDuplicates.ElementAt(i).Equals(listWithDuplicates.ElementAt(j))) { for (int k = j; k < n - 1; k++) { T item = listWithDuplicates.ElementAt(k); item = listWithDuplicates.ElementAt(k + 1); } j--; n--; } } } var distinctList = listWithDuplicates.Take(n).ToList();
Starting from index i = 0
, we evaluate all the items starting from the index j = i + 1
. If we find that i
-th element is equal to the j
-th element, then we shift every item to the left starting from j
to n = listWithDuplicates.Count - 1
.
Every time we find a duplicate, the size of the final list decreases by one. When we get out of the loops we will take the first n
distinct items contained in listWithDuplicates
with Take()
method.
Iterations With Swapping
As already suggested in the article for the removal of duplicates from the array, the iterative method can be improved with the swapping of elements in the loop:
var size = listWithDuplicates.Count; for (int i = 0; i < size; i++) { for (int j = i + 1; j < size; j++) { if (listWithDuplicates.ElementAt(i)!.Equals(listWithDuplicates.ElementAt(j))) { size--; T jThItem = listWithDuplicates.ElementAt(j); jThItem = listWithDuplicates.ElementAt(size); j--; } } } listWithDuplicates.Take(size).ToList();
We swap two nearest items if they are equal, and we decrease by one the size of the list every time this happens. When we exit from the loop, we take the first size
items of listWithDuplicates
. While similar to the previous approach, there are some performance implications that we’re going to explore at the end.
Recursive Approach
The recursion provides another considerable way to perform the removal of duplicates:
public List<T> UsingRecursion(List<T>? listWithDuplicates, List<T>? listWithoutDuplicates = default, int index = 0) { if (listWithoutDuplicates == null) { listWithoutDuplicates = new List<T>(); } if (index >= listWithDuplicates.Count) { return listWithDuplicates; } if (listWithoutDuplicates.IndexOf(listWithDuplicates[index]) < 0) { listWithoutDuplicates.Add(listWithDuplicates[index]); } UsingRecursion(listWithDuplicates, listWithoutDuplicates, index + 1); return listWithoutDuplicates.ToList(); }
In this case, we initialize listWithoutDuplicates
as an empty list if it is null
, and we check if the list contains an item that occupies the position given by index
using the IndexOf()
method.
If listWithoutDuplicates
doesn’t contain the element, we add it to another list and we call again the UsingRecursion()
method passing index
incremented by one. The recursion stops when the passed value of index
is greater than or equal to the size of the list.
Sorting
If we sort the list, we can compare elements two by two:
var listWithoutDuplicates = new List<T>(); listWithDuplicates = listWithDuplicates.OrderBy(x => x).ToList(); T element = default; foreach (T result in listWithDuplicates) { if (!result.Equals(element)) { listWithoutDuplicates.Add(result); element = result; } }
We start by defining a new empty listWithoutDuplicates
, sorting listWithDuplicates
in ascending order, and defining the variable element
with the default value. Then we loop over the items and compare everyone with the element
adding it to listWithoutDuplicates
only if it doesn’t contain it yet. After the eventual addition, we assign to element
the current value in the loop.
Performance Comparison
We want to show the differences in terms of performance between the methods explored so far, using BenchmarkDotNet:
| Method | Mean | Error | StdDev | Rank | Gen 0 | Allocated | |---------------------------- |-----------:|----------:|----------:|-----:|-------:|----------:| | EmptyListWithContainsMethod | 1.213 us | 0.0115 us | 0.0107 us | 1 | 0.0172 | 72 B | | DictionaryMethod | 1.338 us | 0.0213 us | 0.0189 us | 2 | 0.0687 | 288 B | | ConvertToHashSetMethod | 2.409 us | 0.0482 us | 0.0610 us | 3 | 0.9918 | 4,160 B | | DistinctLINQMethod | 2.424 us | 0.0431 us | 0.0382 us | 3 | 1.0071 | 4,224 B | | InitializingHashSetMethod | 2.446 us | 0.0477 us | 0.0823 us | 3 | 0.9918 | 4,160 B | | UnionLINQMethod | 3.981 us | 0.0401 us | 0.0313 us | 4 | 0.0916 | 392 B | | GroupByLINQMethod | 4.653 us | 0.0428 us | 0.0526 us | 5 | 0.8698 | 3,664 B | | RecursiveMethod | 7.302 us | 0.1636 us | 0.4507 us | 6 | 3.4561 | 14,472 B | | SortMethod | 7.756 us | 0.1248 us | 0.1622 us | 7 | 1.9989 | 8,376 B | | IterationsAndSwappingMethod | 9.103 us | 0.1149 us | 0.1018 us | 8 | 1.1597 | 4,888 B | | EmptyListWithAnyMethod | 11.466 us | 0.2288 us | 0.5022 us | 9 | 8.5297 | 35,680 B | | IterationsAndShiftingMethod | 389.509 us | 5.3991 us | 6.2176 us | 10 | 0.9766 | 4,888 B |
The results show how the method which uses EmptyList
with Contains()
has better performance in terms of the time of execution and it is also the one that needs less allocated memory.
It’s interesting to outline the difference between two iterative methods, that differ for the technique adopted to remove duplicates, despite both performing at least two loops. In fact, IterationsAndSwappingMethod()
has a sharply shorter mean time of execution than IterationsAndShiftingMethod()
, even if they both have the same memory allocation needed.
Moreover, the conversion of the list into a HashSet
has a slightly better mean execution time than the initialization of a new HashSet
from the list, even if they are equal in terms of the allocated memory.
Conclusions
We’ve learned several ways to remove duplicates from a list in C#, and we’ve explored the difference in their performance.
Hi, is there a way to delete distinctby method?
I have more than 10,000 records in a table that records the status of some cases. Each update creates a record with the same case but different update.
With linq I first have to download the 10000 records and when it’s already an ienumerable list I can just use distinctby.
With morelinq, the discinctby is done before making the list, so I only download about 100 records. it is very easy, but with your method now it is very complicated.
Anyway, what I want to do is use morelinq and delete the lina method
Your benchmarks should also vary the number of elements in the list. What’s fastest for 8 elements is likely to not be fastest for 80,000 elements.
Thanks for the suggestion. We have the source code provided for this article, so anyone can download and play a bit with the values. We’ve just put some reference results in the article.