In this article, we are going to explore how to select multiple records based on a list of IDs using LINQ. We will demonstrate two ways to do that, and we will investigate the cases in which each might be a better choice.

To download the source code for this article, you can visit our GitHub repository.

Let’s start!

Setting the Stage

To demonstrate selecting the records, let’s create a console application.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Let’s start by creating a simple Employee entity: 

public class Employee
{
    public int Id { get; set; }

    public required string Name { get; set; }
}

The entity has only two fields: employee Id and Name.

After that, we will abstract our database using Microsoft.EntityFrameworkCore and the SQLite database. Take a look at the source code in our sample project to see how we create our DbContext and seed the database.

We want to select multiple records from our database. To accomplish that, we create several random IDs controlled by the SelectCount constant: 

var random = new Random();

var idList = new List<int>();
var idHashSet = new HashSet<int>();

for (var cnt = 0; cnt < SelectCount; cnt++)
{
    var id = random.Next(1, RecordCount);
    idList.Add(id);
    idHashSet.Add(id);
}

The random numbers are within the range from 1 to the RecordCount constant, so we can be sure there are always all the records we want to select. We populate the idList list and the idHashSet hashset with these randomly generated record IDs.

Finally, we create the EmployeeRepository class, which encapsulates all of our selections:

var employeeRepository = new EmployeeRepository(dbContext);

var employees = employeeRepository.GetEmployeesUsingWhere(idList);
employees = employeeRepository.GetEmployeesUsingWhere(idHashSet);

employees = employeeRepository.GetEmployeesUsingJoin(idList);
employees = employeeRepository.GetEmployeesUsingJoin(idHashSet);

We call each of the implementations of different selections.

Now, we will see the details of each selection implementation and discuss its effects.

To learn more about LINQ, check out our article on LINQ Basic Concepts in C#.

Select Multiple Records Using Where() and Contains() Methods

The Where() method filters a sequence of values based on a predicate. Let’s look at the signature:

public static System.Collections.Generic.IEnumerable<TSource> Where<TSource> (this System.Collections.Generic.IEnumerable<TSource> source, Func<TSource,bool> predicate);

The predicate is a function to test each sequence element for a specified condition. In our example, we want the list passed as an argument containing the record’s IDs. 

The Contains() method determines if the sequence contains a particular element. Let’s implement some methods to demonstrate this:

public List<Employee> GetEmployeesUsingWhere(IList<int> ids)
{
    var query = _dbContext.Employees.Where(e => ids.Contains(e.Id));

    return [.. query];
}

public List<Employee> GetEmployeesUsingWhere(HashSet<int> ids)
{
    var query = _dbContext.Employees.Where(e => ids.Contains(e.Id));

    return [.. query];
}

Here, we define two method overloads, GetEmployeesUsingWhere(), one accepting a List as the parameter, and one a HashSet. We use the Where() and Contains() method to determine if our database contains the set of IDs passed in.

We can inspect the generated SQL statement in DebugView of the query variable. Alternatively, we can call the ToQueryString() method on the query variable. 

Let’s check the output when the parameter is a List:

.param set @__ids_0 '[5440,8012,2162,4545,2118,1255,1976,4425,3282,889]'

SELECT "e"."Id", "e"."Name"
FROM "Employees" AS "e"
WHERE "e"."Id" IN (
    SELECT "i"."value"
    FROM json_each(@__ids_0) AS "i"

Similarly, let’s check the output for a HashSet

SELECT "e"."Id", "e"."Name"
FROM "Employees" AS "e"
WHERE "e"."Id" IN (5440, 8012, 2162, 4545, 2118, 1255, 1976, 4425, 3282, 889)

ID values will differ for each execution since we generate them randomly. 

Analysis of the Selection Using Where() and Contains() Methods

The combination of the Where() and Contains() methods will be translated to WHERE..IN SQL syntax in both cases. But, as we can see in the two above code snippets, the difference is that the list will be represented as a JSON array and additional SELECT clauses selecting the values as the input for the IN set, while the HashSet, which consists only of unique values, will be directly used as values for the IN set. 

An additional limitation is the number of elements in the IN clause. This limitation varies on the database provider but tends to be around 2000 elements. In other words, be aware that the use of the Where() and Contains() methods will not work for an extremely large number of IDs. 

We must scan the entire table to find all the records having one of our IDs within a given set. Across a large table, that could cause performance issues. The good news is that most advanced databases can optimize their query plans, and they will recognize that the above query can be optimized to the INNER JOIN syntax. 

Another caveat for the List parameter variant is that it will ignore duplicate values. Put another way, the list with values [1,2,2] will return only two records, the same as the list containing [1,2].

Select Multiple Records Using Join() Method

The Join() method correlates two sequences based on matching keys.

Again, let’s look at selecting multiple records with two method overloads:

public List<Employee> GetEmployeesUsingJoin(IList<int> ids)
{
    var query = _dbContext.Employees.Join(ids, e => e.Id, id => id, (e, id) => e);

    return [.. query];
}

public List<Employee> GetEmployeesUsingJoin(HashSet<int> ids)
{
    var query = _dbContext.Employees.Join(ids, e => e.Id, id => id, (e, id) => e);

    return [.. query];
}

This time, we define two method overloads, GetEmployeesUsingJoin(), again accepting a List and a HashSet, and use the Join() method to find any matches in the database.

Upon inspecting the generated SQL, we can see that the underlying generated SQL is the same for both methods: 

.param set @__p_0 '[5440,8012,2162,4545,2118,1255,1976,4425,3282,889]'

SELECT "e"."Id", "e"."Name"
FROM "Employees" AS "e"
INNER JOIN json_each(@__p_0) AS "p" ON "e"."Id" = "p"."value"

Analysis of the Selection Using Join() Methods

Most developers consider this SQL more readable and more performant. It also doesn’t have a limitation on the number of elements in the list of IDs.

In scenarios where there are duplicate key values, the query will return a corresponding number of records for each duplicate. In some cases, that may be what we want, but in any case, it’s undoubtedly something we want to be aware of. Of course, if we use the HashSet class, then duplicates are inherently prevented. 

On the other hand, the IDs in our list that do not exist in the database, will not appear in the result set. 

The only minor downside to Join() is that its LINQ expression syntax is a little bit more complex compared to the Where() method.

Conclusion

The answer to the question of how to select multiple records based on a list of IDs using LINQ is that, in most cases, there is no difference in the use of the Join() method vs. the combination of Where() and Contains() methods, and it comes down to personal preference. 

However, we should be aware of edge cases like the limitation of the number of elements for the IN clause of the SQL and the fact that it won’t return duplicate records. 

Although most database implementations will optimize and execute the generated SQL with the same execution plan, that may not be the case for all databases and could depend on database design. A good approach would be to inspect the exact behavior for each case. 

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!