In this article, we are going to explore how to select multiple records based on a list of IDs using LINQ. We will demonstrate two ways to do that, and we will investigate the cases in which each might be a better choice.
Let’s start!
Setting the Stage
To demonstrate selecting the records, let’s create a console application.
Let’s start by creating a simple Employee
entity:
public class Employee { public int Id { get; set; } public required string Name { get; set; } }
The entity has only two fields: employee Id
and Name
.
After that, we will abstract our database using Microsoft.EntityFrameworkCore
and the SQLite database. Take a look at the source code in our sample project to see how we create our DbContext
and seed the database.
We want to select multiple records from our database. To accomplish that, we create several random IDs controlled by the SelectCount
constant:
var random = new Random(); var idList = new List<int>(); var idHashSet = new HashSet<int>(); for (var cnt = 0; cnt < SelectCount; cnt++) { var id = random.Next(1, RecordCount); idList.Add(id); idHashSet.Add(id); }
The random numbers are within the range from 1 to the RecordCount
constant, so we can be sure there are always all the records we want to select. We populate the idList
list and the idHashSet
hashset with these randomly generated record IDs.
Finally, we create the EmployeeRepository
class, which encapsulates all of our selections:
var employeeRepository = new EmployeeRepository(dbContext); var employees = employeeRepository.GetEmployeesUsingWhere(idList); employees = employeeRepository.GetEmployeesUsingWhere(idHashSet); employees = employeeRepository.GetEmployeesUsingJoin(idList); employees = employeeRepository.GetEmployeesUsingJoin(idHashSet);
We call each of the implementations of different selections.
Now, we will see the details of each selection implementation and discuss its effects.
To learn more about LINQ, check out our article on LINQ Basic Concepts in C#.
Select Multiple Records Using Where() and Contains() Methods
The Where()
method filters a sequence of values based on a predicate. Let’s look at the signature:
public static System.Collections.Generic.IEnumerable<TSource> Where<TSource> (this System.Collections.Generic.IEnumerable<TSource> source, Func<TSource,bool> predicate);
The predicate is a function to test each sequence element for a specified condition. In our example, we want the list passed as an argument containing the record’s IDs.
The Contains()
method determines if the sequence contains a particular element. Let’s implement some methods to demonstrate this:
public List<Employee> GetEmployeesUsingWhere(IList<int> ids) { var query = _dbContext.Employees.Where(e => ids.Contains(e.Id)); return [.. query]; } public List<Employee> GetEmployeesUsingWhere(HashSet<int> ids) { var query = _dbContext.Employees.Where(e => ids.Contains(e.Id)); return [.. query]; }
Here, we define two method overloads, GetEmployeesUsingWhere()
, one accepting a List
as the parameter, and one a HashSet
. We use the Where()
and Contains()
method to determine if our database contains the set of IDs passed in.
We can inspect the generated SQL statement in DebugView
of the query
variable. Alternatively, we can call the ToQueryString()
method on the query
variable.
Let’s check the output when the parameter is a List
:
.param set @__ids_0 '[5440,8012,2162,4545,2118,1255,1976,4425,3282,889]' SELECT "e"."Id", "e"."Name" FROM "Employees" AS "e" WHERE "e"."Id" IN ( SELECT "i"."value" FROM json_each(@__ids_0) AS "i"
Similarly, let’s check the output for a HashSet
:
SELECT "e"."Id", "e"."Name" FROM "Employees" AS "e" WHERE "e"."Id" IN (5440, 8012, 2162, 4545, 2118, 1255, 1976, 4425, 3282, 889)
ID values will differ for each execution since we generate them randomly.
Analysis of the Selection Using Where() and Contains() Methods
The combination of the Where()
and Contains()
methods will be translated to WHERE..IN
SQL syntax in both cases. But, as we can see in the two above code snippets, the difference is that the list will be represented as a JSON array and additional SELECT
clauses selecting the values as the input for the IN
set, while the HashSet
, which consists only of unique values, will be directly used as values for the IN
set.
An additional limitation is the number of elements in the IN
clause. This limitation varies on the database provider but tends to be around 2000 elements. In other words, be aware that the use of the Where()
and Contains()
methods will not work for an extremely large number of IDs.
We must scan the entire table to find all the records having one of our IDs within a given set. Across a large table, that could cause performance issues. The good news is that most advanced databases can optimize their query plans, and they will recognize that the above query can be optimized to the INNER JOIN
syntax.
Another caveat for the List
parameter variant is that it will ignore duplicate values. Put another way, the list with values [1,2,2] will return only two records, the same as the list containing [1,2].
Select Multiple Records Using Join() Method
The Join()
method correlates two sequences based on matching keys.
Again, let’s look at selecting multiple records with two method overloads:
public List<Employee> GetEmployeesUsingJoin(IList<int> ids) { var query = _dbContext.Employees.Join(ids, e => e.Id, id => id, (e, id) => e); return [.. query]; } public List<Employee> GetEmployeesUsingJoin(HashSet<int> ids) { var query = _dbContext.Employees.Join(ids, e => e.Id, id => id, (e, id) => e); return [.. query]; }
This time, we define two method overloads, GetEmployeesUsingJoin()
, again accepting a List
and a HashSet
, and use the Join()
method to find any matches in the database.
Upon inspecting the generated SQL, we can see that the underlying generated SQL is the same for both methods:
.param set @__p_0 '[5440,8012,2162,4545,2118,1255,1976,4425,3282,889]' SELECT "e"."Id", "e"."Name" FROM "Employees" AS "e" INNER JOIN json_each(@__p_0) AS "p" ON "e"."Id" = "p"."value"
Analysis of the Selection Using Join() Methods
Most developers consider this SQL more readable and more performant. It also doesn’t have a limitation on the number of elements in the list of IDs.
In scenarios where there are duplicate key values, the query will return a corresponding number of records for each duplicate. In some cases, that may be what we want, but in any case, it’s undoubtedly something we want to be aware of. Of course, if we use the HashSet
class, then duplicates are inherently prevented.
On the other hand, the IDs in our list that do not exist in the database, will not appear in the result set.
The only minor downside to Join()
is that its LINQ expression syntax is a little bit more complex compared to the Where()
method.
Conclusion
The answer to the question of how to select multiple records based on a list of IDs using LINQ is that, in most cases, there is no difference in the use of the Join() method vs. the combination of Where() and Contains() methods, and it comes down to personal preference.
However, we should be aware of edge cases like the limitation of the number of elements for the IN clause of the SQL and the fact that it won’t return duplicate records.
Although most database implementations will optimize and execute the generated SQL with the same execution plan, that may not be the case for all databases and could depend on database design. A good approach would be to inspect the exact behavior for each case.