In this article, we walk through a practical guide on how to compare two lists in C# through one property. We explore several methods and conclude with an evaluation benchmark to find the one that best suits our needs.
Let’s dive in!
Create Comparison Application
To start, let’s create a new console app with the command dotnet new console -n App
in the command window. In our example scenario, we have a Customer
class with a unique field Id
and an Order
class with a field CustomerId
that relates one entity to the other.
Let’s create the first appropriate class regarding Customer
:
public class Customer { public int Id { get; set; } public string Firstname { get; set; } public string Surname { get; set; } }
And now let’s implement a simple Order
class:
public class Order { public int CustomerId { get; set; } public int OrderId { get; set; } }
Before we test our methods, let’s populate two lists with sample records:
var customers = new List<Customer>() { new() { Id = 1, Firstname = "Alice", Surname = "Smith"}, new() { Id = 2, Firstname = "John", Surname = "Terry"}, new() { Id = 3, Firstname = "Fred", Surname = "Staton"} }; var orders = new List<Order>() { new () {CustomerId = 1, OrderId = 101}, new () {CustomerId = 2, OrderId = 102}, new () {CustomerId = 2, OrderId = 103} };
We initialize a List
of Customer
type and a List
of Order
type, each one with three instances. The CustomerId
property in the Order
class connects with the Id
property of a corresponding customer in the Customer
list, forming a relationship between customers and their orders. The objective is to obtain a list of distinct customers who have placed orders.
Use of Foreach Loops to Compare Two Lists
Let’s start by implementing a method that uses two foreach
loops:
public static List<Customer> ForEachMethod(List<Customer> customerList, List<Order> orderList) { var customersWithOrders = new List<Customer>(); foreach (var customer in customerList) { foreach (var order in orderList) { if (customer.Id == order.CustomerId && !customersWithOrders.Contains(customer)) { customersWithOrders.Add(customer); } } } return customersWithOrders; }
Every method we implement, takes two list parameters, a List<Customer>
and a List<Order>
, respectively. The outcome is always a List<Customer>
with the customer records that have placed orders.
Firstly, we initialize an empty List<Customer>
, customersWithOrders
, to hold the customers we identify. Then, using nested foreach
loops, we iterate through each customer in the customerList
and each order in the orderList
. For each customer-order pair, we check if the customer’s Id
matches the order’s CustomerId
and if the customer does not already exist in the customersWithOrders
list.
When we meet these conditions, we add the customer to the customersWithOrders
list. We repeat this process until we examine all combinations and then we return the customersWithOrders
list. While this approach accomplishes our objective, it’s worth noting that the time complexity is, where ‘n’ represents the total number of elements in both collections.
Finally, to check the result, let’s present the outcome list:
var customerWithOrders = ListCompareMethods.ForEachMethod(customers, orders); Console.WriteLine(string.Join(',',customerWithOrders.Select(i=>i.Firstname)));
We print the objects in customerWithOrders
list to the console. Next, to concatenate the Firstname
values into a single string separated by commas, we use string.Join()
. It is performed with the Select()
method that projects each Customer
object to their Firstname
:
Alice,John
We verify that these two objects were expected as a result, since the Order
list includes Customer
with Id
1 and 2.
LINQ and Where Extension
In this method, let’s make use of LINQ and the Where()
extension:
public List<Customer> WhereAnyMethod(List<Customer> customerList, List<Order> orderList) { return customerList.Where(y => orderList.Any(z => z.CustomerId == y.Id)).ToList(); }
Here we utilize LINQ to filter the customerList
based on a specific condition. For each Customer
in customerList
, we check if there is any Order
in the orderList
where the CustomerId
of the Order
matches the Id
of the Customer
.
If such an Order
exists, we include the Customer
in the result. Before we return it, we convert the filtered result to a List<Customer>
type.
Use Join Operator of LINQ Query to Compare Lists
Let’s now implement the next solution with the Join
operator of LINQ:
public static List<Customer> JoinMethod(List<Customer> customerList, List<Order> orderList) { var customersWithOrders = (from customer in customerList where orderList.Any(order => customer.Id == order.CustomerId) select customer ).ToList(); return customersWithOrders; }
Here, we use LINQ to iterate through each element in customerList
using the from
clause. Then, we utilise the where
clause to filter customers based on the existence of any Order
in orderList
whose CustomerId
matches the Id
of the current Customer
.
The select
clause then determines what is included in the result, selecting the Customer
object that satisfies the filtering condition. Finally, we enclose the entire LINQ query in parentheses, and the ToList()
method to convert the result into a materialized List<Customer>
.
Compare Lists With Join Extension Method
Let’s utilize the Join()
extension method of List
type to find the customers with orders:
public static List<Customer> JoinListMethod(List<Customer> customerList, List<Order> orderList) { return customerList.Join( orderList, customer => customer.Id, order => order.CustomerId, (customer, order) => customer ).Distinct().ToList(); }
Here, we perform an inner join operation between two lists, customerList
and orderList
. The method uses LINQ to join the lists based on matching keys: the Id
property of each Customer
in customerList
and the CustomerId
property of each Order
in orderList
. The result of the join is a sequence of paired elements, where each pair consists of a Customer
and the corresponding Order
.
The result selector (customer, order) => customer
specifies that only the Customer
part of each pair is included in the final result. Then, we use the Distinct()
method to ensure that each Customer
appears only once in the result, removing any duplicates.
Finally, the result is converted into a List<Customer>
using the ToList()
method.
Use HashSet to Compare the Lists
The next solution uses the HashSet dataset and uses it to retrieve the result we desire:
public static List<Customer> HashSetMethod(List<Customer> customerList, List<Order> orderList) { var customerIds = orderList.Select(i => i.CustomerId).ToHashSet(); return customerList.Where(i => customerIds.Contains(i.Id)).ToList(); }
This time, we create a HashSet
collection customerIds
, containing unique CustomerId
values that we extract from the orderList
. The HashSet
ensures that we include only distinct values, promoting efficient containment checks.
Next, the method filters the customerList
using the Where()
method. It includes in the result only those customers whose Id
is present in the customerIds
HashSet
. Finally, we convert the result into a List<Customer>
using the ToList()
method.
Now, let’s compare our methods by running a set of benchmarks.
Benchmark Set Up
We proceed with evaluating these methods by performing a benchmark, in terms of efficiency and speed. Let’s set up two helper methods for our scenario for testing list comparison performance, in our benchmark class:
private List<Customer>? _customers; private List<Order>? _orders; public void GlobalSetup() { var numberOfCustomers = 10000; var numberOfOrders = 500000; _customers = GenerateRandomCustomers(numberOfCustomers).ToList(); _orders = GenerateRandomOrders(numberOfOrders, _customers).ToList(); } private static IEnumerable<Customer> GenerateRandomCustomers(int count) { return Enumerable.Range(1, count) .Select(i => new Customer { Id = i, Firstname = $"CustomerFirstname{i}", Surname = $"CustomerSurname{i}" }); } private static IEnumerable<Order> GenerateRandomOrders(int count, List<Customer> customers) { var random = new Random(); return Enumerable.Range(1, count) .Select(i => new Order { OrderId = i, CustomerId = random.Next(1, customers.Count + 1) }); }
In our benchmarking class, the GlobalSetup()
method prepares the data we need for our performance evaluations. We mark it with the [GlobalSetup]
attribute, to execute once before all benchmark methods.
Within it, we initialize _customers
and _orders
with realistic and randomized datasets. Specifically, we utilize the GenerateRandomCustomers()
method to create a list of 10,000 customers, each having unique IDs, first names, and surnames. Subsequently, the _orders
list is populated using the GenerateRandomOrders()
method, generating 500,000 orders with unique OrderIds
and associating each order with a randomly selected customer from the _customers
list.
Our GenerateRandomCustomers()
method facilitates the creation of a sequence of random Customer
objects based on the specified count. Leveraging Enumerable.Range()
to produce a sequence of integers, we employ the Select()
method to generate a new Customer
object for each integer, ensuring distinct Ids
.
Similarly, our GenerateRandomOrders()
method generates a sequence of random Order
objects, considering the desired count and the list of customers. Using Enumerable.Range()
and the Select()
method, we create Order
objects, setting the CustomerId
property of each order to a randomly selected value between 1 and the total count of customers. With this, we establish a valid association between orders and customers.
Together, these helper methods enable us to establish a realistic dataset for benchmarking methods designed to compare and filter lists of customers and orders based on a specific property.
Evaluation Results
Let’s evaluate the benchmark results:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated | |--------------- |--------------:|------------:|------------:|----------:|----------:|---------:|------------:| | HashSetMethod | 3.501 ms | 0.0671 ms | 0.0848 ms | 148.4375 | 82.0313 | 58.5938 | 1308.53 KB | | JoinListMethod | 35.188 ms | 0.6938 ms | 1.3200 ms | 2400.0000 | 1400.0000 | 333.3333 | 13302.96 KB | | JoinMethod | 452.103 ms | 3.1305 ms | 2.6141 ms | - | - | - | 1512.98 KB | | WhereAnyMethod | 459.200 ms | 9.0506 ms | 10.4227 ms | - | - | - | 1506.85 KB | | ForEachMethod | 10,195.861 ms | 162.9539 ms | 217.5387 ms | - | - | - | 318.92 KB |
Among the methods we test, HashSetMethod()
stands out as the most efficient, as it has the lowest mean execution time and allocates the least amount of memory. By leveraging the constant-time complexity of the Contains()
operation in HashSet
, this method ensures swift identification of unique customer IDs during the filtering process. The elimination of duplicate values further streamlines comparisons, contributing to enhanced efficiency. Additionally, the memory overhead is minimized as HashSet
employs a hash table structure internally, allowing for rapid lookups without excessive resource consumption.
Next, we can see that JoinListMethod()
follows closely as the second-ranked method, while JoinMethod()
and WhereAnyMethod()
share the third position. Finally, the least performant method is ForEachMethod()
, one that has the highest mean execution time and the least efficient memory allocation.
These results suggest that leveraging a HashSet
for comparison yields optimal performance in this context, offering a compelling solution for scenarios involving the comparison of large lists based on a specific property.
Conclusion
In this article, we have explored and benchmarked various methods for comparing two lists based on a specific property, ranging from traditional iteration approaches to more optimized techniques, providing valuable insights into their respective performance characteristics.