The HashSet<T> and SortedSet<T> classes in the System.Collections.Generic namespace define two ways of storing and iterating over a collection of objects in C#. Both classes have pros and cons, so which one should we use?

This article is going to compare and contrast the two types of collections and when it’s appropriate to use each. 

To download the source code for this article, you can visit our GitHub repository.

So without further ado, let’s get started!

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

What Is a HashSet in C#?

A HashSet is a collection of unique elements that uses a hash table for storage, allowing faster retrieval of elements than other collection types. Adding and removing elements to the HashSet also has constant time complexity. However, it does not maintain insertion order and cannot access elements by index.

To go into details about this collection, you can refer to this article to learn more about how HashSet works in C#.

What Is a SortedSet in C#?

A SortedSet, in C#, is a collection of unique elements sorted according to their natural ordering or a specified comparator. It allows for fast retrieval of elements and efficient operations on subsets.

One notable feature of the SortedSet is its ability to efficiently perform set operations such as union, intersection, and difference, which makes it particularly useful for tasks such as creating a list of unique elements from multiple sets.

A SortedSet does not allow null values or permit duplicate elements. Trying to add a duplicate element will not affect the set. 

Please refer to this article to learn more about how SortedSet works in C#.

Similarities Between a HashSet and a SortedSet

HashSet<T> and SortedSet<T> objects store unique elements because they implement the ISet<T> interface. 

Both data structures offer better performance when adding, retrieving, and searching operations than other collections, such as lists and arrays. 

Differences Between a HashSet and a SortedSet

HashSet and SortedSet objects use different data structures to store data.   

The HashSet<T> class uses a hash table to organize its elements. In contrast, a SortedSet<T> class uses a balanced binary tree data structure to maintain elements in a specific order. Therefore, the SortedSet<T> class has slower retrieval operations than a HashSet<T> class but can easily access elements in a specific order.

A SortedSet<T> can retrieve a subset of elements within a specific range using the SortedSet<T>.GetViewBetween(T, T) method, which is not available in the HashSet<T> class. 

Additionally, SortedSet<T> allows us to retrieve the minimum and maximum elements through its SortedSet<T>.Max and  SortedSet<T>.Min properties, while the HashSet<T> class lacks them. It achieves this by implementing the IComparer<T> interface. 

Let’s analyze how HashSet and SortedSet objects perform theoretically and then use benchmarks to test those theories.

Big O Analysis

Big O analysis is a way of determining the efficiency and complexity of how an algorithm performs. It specifically looks at the worst-case scenario for the algorithm, as that will give the upper bound on its runtime.

So, how do HashSet and SortedSet objects perform?

We can use Big O analysis to assess how the data structures perform when handling crucial CRUD operations:

OperationHashSetSortedSet
IterationO(N)O(N)
SearchO(1)O(Log N)
InsertionO(1)O(Log N)
Removing ElementsO(1)O(Log N)
Enumerating Elements in Sorted OrderO(N Log N)O(N)

HashSet<T> and SortedSet<T> classes perform similarly when iterating through elements, as the operation depends on the input size, hence, has a performance of O(N).

Additionally, the HashSet<T> is faster than a SortedSet<T> when searching elements, as it has a constant time lookup performance of O(1), while the latter has O(N) performance assuming the last element is the one we are looking for. 

When adding or removing elements, the HashSet<T> class performs better than the SortedSet<T> class. The HashSet<T> class uses a hash table to organize its elements; inserting an element is an O(1) operation. On the other hand, the SortedSet<T> uses a binary tree with a Root node and a Left and Right node on every node instance. Since the SortedSet<T> class ensures that the elements are in the correct order, every insertion operation places the new element in the correct location in the set, making it an O(log N) operation.

The HashSet<T> class performs poorly when enumerating elements, as we can’t access them by their indices. We may be forced to use an enumerator or copy the object to a different collection, such as a list, to enumerate its elements, which takes O(N log N) time. On the other hand, the elements in the SortedSet<T> objects are in the correct order; hence, enumeration processes take O(N) time.

Without further ado, let’s test how these classes perform!

Performance Benchmarks

First, let’s implement a function that returns a list of integers. We are going to use it to initialize our SortedSet and HashSet objects:

private List<int> RandomInts(int size)
{
    var rand = new Random();
    var numbers = new List<int>();

    for (int i = 0; i < size - 1; i++)
    {
        numbers.Add(rand.Next());
    }

    numbers.Add(Int32.MaxValue - 1);

    return numbers;
}

Our RandomInts() method takes the number of integers to generate as its sole input. It uses the inbuilt random class to generate random numbers and inserts each value into a List<int> object before returning it. Also, we add an integer Int32.MaxValue - 1, which we are going to use later. 

Next, let’s initialize our SortedSet and HashSet objects by using our RandomInts() method:

private readonly List<int> _numList;
private readonly HashSet<int> _hashSet;
private readonly SortedSet<int> _sortedSet;
private readonly int _searchValue;

public Operations()
{
    _numList = RandomInts(1000000);
    _hashSet = InitializeIntHashSet();
    _sortedSet = InitializeIntSortedSet();
    _searchValue = Int32.MaxValue - 1;
}

public HashSet<int> InitializeIntHashSet() 
{
    var hashSet = new HashSet<int>();

    foreach (var number in _numList) 
    {
        hashSet.Add(number);
    }

    return hashSet;
}

public SortedSet<int> InitializeIntSortedSet()
{
    var sortedSet = new SortedSet<int>();

    foreach (var number in _numList)
    {
        sortedSet.Add(number);
    }

    return sortedSet;
}

Here, we populate the HashSet<int> and SortedSet<int> objects with one million integers, which covers the iteration and insertion of elements operations. However, we must remember that these data structures hold unique elements; hence, the objects may have fewer values if _numList contains duplicates. 

Search and Remove Elements From the HashSet and SortedSet

Let’s check whether an element exists in our SortedSet and HashSet objects by invoking the Contains() method:

public bool SearchSortedSet()
{
    return _sortedSet.Contains(_searchValue);
}

public bool SearchHashSet()
{
    return _hashSet.Contains(_searchValue);
}

We expect these methods to return true as _searchValue is present in our SortedSet and HashSet objects.

Next, let’s implement functions to remove elements by invoking the Remove() method:

public HashSet<int> RemoveElementFromHashSet()
{
    _hashSet.Remove(_searchValue);

    return _hashSet;
}

public SortedSet<int> RemoveElementFromSortedSet()
{
    _sortedSet.Remove(_searchValue);

    return _sortedSet;
}

Since both the HashSet<T> and SortedSet<T> classes support the RemoveWhere(Predicate T) method, let’s implement these functions:

public HashSet<int> RemoveWhereFromHashSet()
{
    _hashSet.RemoveWhere(IsOdd);

    return _hashSet;
}

public SortedSet<int> RemoveWhereFromSortedSet()
{
    _sortedSet.RemoveWhere(IsOdd);

    return _sortedSet;
}

private bool IsOdd(int num)
{
    return num % 2 == 1;
}

We use the IsOdd() method as our predicate for the RemoveWhere() method, which removes all the odd numbers from both objects. 

How to Sort Elements in a Sorted Order

Sometimes, we need to sort the elements in our SortedSet or HashSet objects in a specific order:

public List<int> SortHashSetElements()
{
    var sortElements = _hashSet.OrderBy(element => element).ToList();

    return sortElements;
}

public List<int> SortSortedSetElements()
{
    return _sortedSet.ToList();
}

Since the elements in the _sortedSet are in the correct order, we invoke the ToList() method to enumerate them.

But, for _heshSet, we use the OrderBy() method to sort the elements in ascending order in our _hashSet.

Alternatively, we can convert the _hashSet object into different collections, such as arrays, before sorting them:

var array = _hashSet.ToArray();
Array.Sort(array);

Performance Results

Finally, let’s assess how the SortedSet and HashSet objects perform when performing our CRUD operations:

|                     Method |                 Mean |             StdDev |               Median | Allocated |
|--------------------------- |---------------------:|-------------------:|---------------------:|----------:|
|       InitializeIntHashSet |   147,134,671.465 ns |  14,192,012.583 ns |   150,572,175.000 ns |43111216 B |
|     InitializeIntSortedSet | 1,285,055,140.000 ns | 127,630,638.873 ns | 1,276,749,550.000 ns |39991392 B |
|                            |                      |                    |                      |           |
|   RemoveElementFromHashSet |            10.007 ns |           2.159 ns |             9.582 ns |         - |
| RemoveElementFromSortedSet |           287.854 ns |          37.483 ns |           278.201 ns |         - |
|                            |                      |                    |                      |           |
|     RemoveWhereFromHashSet |     7,580,281.445 ns |     415,922.431 ns |     7,480,238.281 ns |      70 B |
|   RemoveWhereFromSortedSet |    49,730,196.986 ns |   5,309,548.323 ns |    47,545,329.167 ns | 4098112 B |
|                            |                      |                    |                      |           |
|              SearchHashSet |             8.271 ns |           1.907 ns |             7.558 ns |         - |
|            SearchSortedSet |           118.102 ns |           1.664 ns |           117.925 ns |         - | 
|                            |                      |                    |                      |           |   
|      SortSortedSetElements |    87,276,315.625 ns |   1,674,766.333 ns |    86,826,458.333 ns | 4001217 B |
|        SortHashSetElements |   492,307,633.000 ns |  89,256,871.787 ns |   492,247,450.000 ns |15997040 B |

We can see that the HashSet<T> class is faster than the SortedSet<T> classes in search, addition, and data removal processes. The HashSet<T> class performs better in these operations since it uses a hash table instead of a binary tree in a SortedSet<T> object as its underlying data structure. 

Although the test results show the SortedSet<T> class performs better than the HashSet<T> class when enumerating elements in sorted order, we should remember that the sorting process takes place when initializing the SortedSet<T>, which is slower than initializing the HashSet<T> class. Therefore, to sort elements in the SortedSet<T> object, we simply convert it into a List<T>, while in the HashSet<T> object, we have to order the elements before adding them into a List<T>. 

Conclusion

So, in this article, we learn the similarities and differences between the HashSet<T> and SortedSet<T> classes in C#.

The HashSet<T> class is the way to go if we need to store unique elements and not care about their order. However, the SortedSet<T> class is our best bet for sorting unique data while adding elements. 

These collections have different performance characteristics depending on how we intend to use them. The HashSet<T> class is generally faster for lookup operations, while the SortedSet<T> class has better enumeration performance.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!