In this article, we walk through a practical guide on how to compare two lists in C# through one property. We explore several methods and conclude with an evaluation benchmark to find the one that best suits our needs.

To download the source code for this article, you can visit our GitHub repository.

Let’s dive in!

Create Comparison Application

To start, let’s create a new console app with the command dotnet new console -n App in the command window. In our example scenario, we have a Customer class with a unique field Id and an Order class with a field CustomerId that relates one entity to the other.

Let’s create the first appropriate class regarding Customer:

public class Customer
{
    public int Id { get; set; }
    public string Firstname { get; set; }
    public string Surname { get; set; }
}

And now let’s implement a simple Order class:

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!
public class Order
{
    public int CustomerId { get; set; }
    public int OrderId { get; set; }    
}

Before we test our methods, let’s populate two lists with sample records:

var customers = new List<Customer>()
{
    new() { Id = 1, Firstname = "Alice", Surname = "Smith"},
    new() { Id = 2, Firstname = "John", Surname = "Terry"},
    new() { Id = 3, Firstname = "Fred", Surname = "Staton"}
};

var orders = new List<Order>()
{
    new () {CustomerId = 1, OrderId = 101},
    new () {CustomerId = 2, OrderId = 102},
    new () {CustomerId = 2, OrderId = 103}
};

We initialize a List of Customer type and a List of Order type, each one with three instances. The CustomerId property in the Order class connects with the Id property of a corresponding customer in the Customer list, forming a relationship between customers and their orders. The objective is to obtain a list of distinct customers who have placed orders.

Use of Foreach Loops to Compare Two Lists

Let’s start by implementing a method that uses two foreach loops:

public static List<Customer> ForEachMethod(List<Customer> customerList, List<Order> orderList) 
{ 
    var customersWithOrders = new List<Customer>();

    foreach (var customer in customerList)
    {
        foreach (var order in orderList)
        {
            if (customer.Id == order.CustomerId && !customersWithOrders.Contains(customer))
            {
                customersWithOrders.Add(customer);
            }
        }
    }

    return customersWithOrders;
}

Every method we implement, takes two list parameters, a List<Customer> and a List<Order>, respectively. The outcome is always a List<Customer> with the customer records that have placed orders.

Firstly, we initialize an empty List<Customer>, customersWithOrders, to hold the customers we identify. Then, using nested foreach loops, we iterate through each customer in the customerList and each order in the orderList. For each customer-order pair, we check if the customer’s Id matches the order’s CustomerId and if the customer does not already exist in the customersWithOrders list.

When we meet these conditions, we add the customer to the customersWithOrders list. We repeat this process until we examine all combinations and then we return the customersWithOrders list. While this approach accomplishes our objective, it’s worth noting that the time complexity is, where ‘n’ represents the total number of elements in both collections.

Finally, to check the result, let’s present the outcome list:

var customerWithOrders = ListCompareMethods.ForEachMethod(customers, orders);
Console.WriteLine(string.Join(',',customerWithOrders.Select(i=>i.Firstname)));

We print the objects in customerWithOrders list to the console. Next, to concatenate the Firstname values into a single string separated by commas, we use string.Join(). It is performed with the Select() method that projects each Customer object to their Firstname:

Alice,John

We verify that these two objects were expected as a result, since the Order list includes Customer with Id 1 and 2.

LINQ and Where Extension

In this method, let’s make use of LINQ and the Where() extension:

public List<Customer> WhereAnyMethod(List<Customer> customerList, List<Order> orderList)
{
    return customerList.Where(y => orderList.Any(z => z.CustomerId == y.Id)).ToList();
}

Here we utilize LINQ to filter the customerList based on a specific condition. For each Customer in customerList, we check if there is any Order in the orderList where the CustomerId of the Order matches the Id of the Customer.

If such an Order exists, we include the Customer in the result. Before we return it, we convert the filtered result to a List<Customer> type.

Use Join Operator of LINQ Query to Compare Lists

Let’s now implement the next solution with the Join operator of LINQ:

public static List<Customer> JoinMethod(List<Customer> customerList, List<Order> orderList)
{
    var customersWithOrders = (from customer in customerList
                               where orderList.Any(order => customer.Id == order.CustomerId)
                               select customer
                               ).ToList();

    return customersWithOrders;
}

Here, we use LINQ to iterate through each element in customerList using the from clause. Then, we utilise the where clause to filter customers based on the existence of any Order in orderList whose CustomerId matches the Id of the current Customer.

The select clause then determines what is included in the result, selecting the Customer object that satisfies the filtering condition. Finally, we enclose the entire LINQ query in parentheses, and the ToList() method to convert the result into a materialized List<Customer>.

Compare Lists With Join Extension Method

Let’s utilize the Join() extension method of List type to find the customers with orders:

public static List<Customer> JoinListMethod(List<Customer> customerList, List<Order> orderList)
{
    return customerList.Join(
            orderList,
            customer => customer.Id,
            order => order.CustomerId,
            (customer, order) => customer
        ).Distinct().ToList();
}

Here, we perform an inner join operation between two lists, customerList and orderList. The method uses LINQ to join the lists based on matching keys: the Id property of each Customer in customerList and the CustomerId property of each Order in orderList. The result of the join is a sequence of paired elements, where each pair consists of a Customer and the corresponding Order.

The result selector (customer, order) => customer specifies that only the Customer part of each pair is included in the final result. Then, we use the Distinct() method to ensure that each Customer appears only once in the result, removing any duplicates.

Finally, the result is converted into a List<Customer> using the ToList() method.

Use HashSet to Compare the Lists

The next solution uses the HashSet dataset and uses it to retrieve the result we desire:

public static List<Customer> HashSetMethod(List<Customer> customerList, List<Order> orderList)
{
    var customerIds = orderList.Select(i => i.CustomerId).ToHashSet();

    return customerList.Where(i => customerIds.Contains(i.Id)).ToList();
}

This time, we create a HashSet collection customerIds, containing unique CustomerId values that we extract from the orderList. The HashSet ensures that we include only distinct values, promoting efficient containment checks.

Next, the method filters the customerList using the Where() method. It includes in the result only those customers whose Id is present in the customerIds HashSet. Finally, we convert the result into a List<Customer> using the ToList() method.

Now, let’s compare our methods by running a set of benchmarks.

Benchmark Set Up

We proceed with evaluating these methods by performing a benchmark, in terms of efficiency and speed. Let’s set up two helper methods for our scenario for testing list comparison performance, in our benchmark class:

private List<Customer>? _customers;
private List<Order>? _orders;

public void GlobalSetup()
{
    var numberOfCustomers = 10000;
    var numberOfOrders = 500000;

    _customers = GenerateRandomCustomers(numberOfCustomers).ToList();
    _orders = GenerateRandomOrders(numberOfOrders, _customers).ToList();
}

private static IEnumerable<Customer> GenerateRandomCustomers(int count)
{
    return Enumerable.Range(1, count)
        .Select(i => new Customer
        {
            Id = i,
            Firstname = $"CustomerFirstname{i}",
            Surname = $"CustomerSurname{i}"                    
        });
}

private static IEnumerable<Order> GenerateRandomOrders(int count, List<Customer> customers)
{
    var random = new Random();

    return Enumerable.Range(1, count)
        .Select(i => new Order
        {
            OrderId = i,
            CustomerId = random.Next(1, customers.Count + 1)
        });
}

In our benchmarking class, the GlobalSetup() method prepares the data we need for our performance evaluations. We mark it with the [GlobalSetup] attribute, to execute once before all benchmark methods.

Within it, we initialize _customers and _orders with realistic and randomized datasets. Specifically, we utilize the GenerateRandomCustomers() method to create a list of 10,000 customers, each having unique IDs, first names, and surnames. Subsequently, the _orders list is populated using the GenerateRandomOrders() method, generating 500,000 orders with unique OrderIds and associating each order with a randomly selected customer from the _customers list.

Our GenerateRandomCustomers() method facilitates the creation of a sequence of random Customer objects based on the specified count. Leveraging Enumerable.Range() to produce a sequence of integers, we employ the Select() method to generate a new Customer object for each integer, ensuring distinct Ids.

Similarly, our GenerateRandomOrders() method generates a sequence of random Order objects, considering the desired count and the list of customers. Using Enumerable.Range() and the Select() method, we create Order objects, setting the CustomerId property of each order to a randomly selected value between 1 and the total count of customers. With this, we establish a valid association between orders and customers.

Together, these helper methods enable us to establish a realistic dataset for benchmarking methods designed to compare and filter lists of customers and orders based on a specific property.

Evaluation Results

Let’s evaluate the benchmark results:

| Method         | Mean          | Error       | StdDev      | Gen0      | Gen1      | Gen2     | Allocated   |
|--------------- |--------------:|------------:|------------:|----------:|----------:|---------:|------------:|
| HashSetMethod  |      3.501 ms |   0.0671 ms |   0.0848 ms |  148.4375 |   82.0313 |  58.5938 |  1308.53 KB |
| JoinListMethod |     35.188 ms |   0.6938 ms |   1.3200 ms | 2400.0000 | 1400.0000 | 333.3333 | 13302.96 KB |
| JoinMethod     |    452.103 ms |   3.1305 ms |   2.6141 ms |         - |         - |        - |  1512.98 KB |
| WhereAnyMethod |    459.200 ms |   9.0506 ms |  10.4227 ms |         - |         - |        - |  1506.85 KB |
| ForEachMethod  | 10,195.861 ms | 162.9539 ms | 217.5387 ms |         - |         - |        - |   318.92 KB |

Among the methods we test, HashSetMethod() stands out as the most efficient, as it has the lowest mean execution time and allocates the least amount of memory. By leveraging the constant-time complexity of the Contains() operation in HashSet, this method ensures swift identification of unique customer IDs during the filtering process. The elimination of duplicate values further streamlines comparisons, contributing to enhanced efficiency. Additionally, the memory overhead is minimized as HashSet employs a hash table structure internally, allowing for rapid lookups without excessive resource consumption.

Next, we can see that JoinListMethod() follows closely as the second-ranked method, while JoinMethod() and WhereAnyMethod() share the third position. Finally, the least performant method is ForEachMethod(), one that has the highest mean execution time and the least efficient memory allocation.

These results suggest that leveraging a HashSet for comparison yields optimal performance in this context, offering a compelling solution for scenarios involving the comparison of large lists based on a specific property.

Conclusion

In this article, we have explored and benchmarked various methods for comparing two lists based on a specific property, ranging from traditional iteration approaches to more optimized techniques, providing valuable insights into their respective performance characteristics.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!