In this article, we will explore different methods to get the first record in each group with LINQ.
Grouping, searching, sorting, filtering, and other types of manipulations on object arrays are among the tasks developers do daily. Another of those tasks is finding the first record (or the last) in a grouped list of tasks for each group.
So, let’s see how we can do it.
Set Up the Environment
First, let’s define our test environment. For our use cases, we will use a list of objects of type Student
:
public class Student { public string FirstName { get; set; } = string.Empty; public string LastName { get; set; } = string.Empty; public DateOnly DateOfBirth { get; set; } public int Class { get; set; } public override bool Equals(object? obj) { var student = obj as Student; if(student == null) return false; return student.FirstName.Equals(FirstName) && student.LastName.Equals(LastName) && student.DateOfBirth.Equals(DateOfBirth); } }
Now, let’s define a GenerateStudents()
method:
public static List<Student> GenerateStudents(int count = 100_000) => new Faker<Student>() .UseSeed(42) .RuleFor(m => m.FirstName, faker => faker.Person.FirstName) .RuleFor(m => m.LastName, faker => faker.Person.LastName) .RuleFor(m => m.DateOfBirth, faker => DateOnly.FromDateTime(faker.Person.DateOfBirth)) .RuleFor(m => m.Class, faker => faker.PickRandom(Enumerable.Range(1,10))) .Generate(count);
Here, we use the library Bogus to define rules and generate our list of Students. We also set a value for the random seed to make our random values repeatable for benchmarking and testing.
Different Methods to Retrieve the First Record in Each Group
We will define and test multiple methods of grouping our Students by the property Class
, and finding the youngest student in each class.
Retrieve the First Record in Each Group Using GroupBy
There are two approaches we can take using the GroupBy()
method.
The first method uses GroupBy()
along with a Func
delegate to group students into classes:
public static List<Student> GetYoungestStudentInClassLinqGroupBy1(this List<Student> students) => students.GroupBy(m => m.Class, (key, g) => g.OrderByDescending(e => e.DateOfBirth).First()) .ToList();
For the second parameter of GroupBy()
, we define a Func
to specify which object to be returned:
(key, g) => g.OrderByDescending(e => e.DateOfBirth).First()
In our case, we want the youngest student, so we return the first record in a group sorted by the DateOfBirth
property in descending order.
Our second approach using GroupBy()
is similar to the first one:
public static List<Student> GetYoungestStudentInClassLinqGroupBy2(this List<Student> students) => students.GroupBy(m => m.Class).Select(g => g.OrderByDescending(e => e.DateOfBirth).First()) .ToList();
In this case, instead of a Func
delegate in the GroupBy()
method, we pass a similar Func
into the Select()
method to retrieve the youngest student from within each group.
As we will see later, both methods have similar performance, so choose whichever one fits your programming style.
Extract the First Record in Each Group Using ToLookup
The next method in the list is the ToLookup()
:
public static List<Student> GetYoungestStudentInClassLinqLookup(this List<Student> students) => students.ToLookup(m => m.Class).Select(g => g.OrderByDescending(e => e.DateOfBirth).First()) .ToList();
The ToLookup()
method produces a one-to-many dictionary that maps the key to a collection of values. After that, we use the same Select()
query against the dictionary as in the previous method.
Using ToDictionary To Retrieve the First Record
For our final LINQ-based method, we use ToDictionary()
:
public static List<Student> GetYoungestStudentInClassLinqDictionary(this List<Student> students) => students.Select(m => m.Class).Distinct() .ToDictionary(m => m, m => students .Where(s => s.Class.Equals(m)) .OrderByDescending(s => s.DateOfBirth).First()) .Values .ToList();
Here, first, we need to prevent duplicate keys. For this, we select the Distinct()
values of the Class
property:
students.Select(m => m.Class).Distinct()
Now, we use the ToDictionary()
method to generate a dictionary, with the Class
used as a key. The Func
delegate on lines 5-7 mandates that each group contain only the youngest student for a given Class
property value.
Finally, we return a collection with only the values of the generated Dictionary
.
Iterate and Group Using Dictionary
While the focus of the article is LINQ methods, for reference, let’s define a method using a non-LINQ, iterative approach:
public static List<Student> GetYoungestStudentInClassIterativeDictionary(this List<Student> students) { var groupedStudents = new Dictionary<int, Student>(); foreach (var student in students) { if (!groupedStudents.TryGetValue(student.Class, out Student existingStudent) || student.DateOfBirth > existingStudent.DateOfBirth) { groupedStudents[student.Class] = student; } } return groupedStudents.Values.ToList(); }
Here, we first define the Dictionary
object to store key-value pairs. In our case, the key Class
is an integer, and the values are, of course, of type Student
.
Next, we iterate through the list and add or update the youngest student for each Class.
As before, we return the list of our students stored in the Values
property of the dictionary.
Benchmarking First Record Method Performance
Let’s see how these methods perform on an array with 100,000 records. For this, we will use the BenchmarkDotNet library. Let’s define our PerformanceBenchmark
class:
[Orderer(SummaryOrderPolicy.FastestToSlowest)] [HideColumns(new string[] { "Job", "Error", "StdDev", "Median" })] [MemoryDiagnoser(false)] [RankColumn] public class PerformanceBenchmark { private static readonly List<Student> _students = Methods.GenerateStudents(); [Benchmark] public List<Student> LinqGroupBy1() => _students.GetYoungestStudentInClassLinqGroupBy1(); [Benchmark] public List<Student> LinqGroupBy2() => _students.GetYoungestStudentInClassLinqGroupBy2(); [Benchmark] public List<Student> LinqLookup() => _students.GetYoungestStudentInClassLinqLookup(); [Benchmark] public List<Student> LinqDictionary() => _students.GetYoungestStudentInClassLinqDictionary(); [Benchmark] public List<Student> IterativeDictionary() => _students.GetYoungestStudentInClassIterativeDictionary(); }
To run the benchmark test, we use:
BenchmarkRunner.Run<PerformanceBenchmark>();
Finally, we get the results for the 100,000 records:
| Method | Mean | Rank | Allocated | |-------------------- |----------:|-----:|-----------:| | IterativeDictionary | 1.132 ms | 1 | 1.13 KB | | LinqGroupBy1 | 5.346 ms | 2 | 3347.57 KB | | LinqLookup | 5.632 ms | 3 | 2566.4 KB | | LinqGroupBy2 | 5.633 ms | 3 | 2566.43 KB | | LinqDictionary | 12.939 ms | 4 | 4.41 KB |
While LINQ-based methods provide simplicity and, in one way, a clear grouping logic, it’s worth noting that the best overall performance is offered by the simple for loop.
Also, we notice methods that rely on a Dictionary
requires much less memory than the others.
Let’s see the results on 1,000,000 records:
| Method | Mean | Rank | Allocated | |-------------------- |----------:|-----:|-----------:| | IterativeDictionary | 15.26 ms | 1 | - | | LinqLookup | 61.06 ms | 2 | 20979768 B | | LinqGroupBy1 | 67.99 ms | 2 | 28979680 B | | LinqGroupBy2 | 83.61 ms | 3 | 20979808 B | | LinqDictionary | 178.67 ms | 4 | - |
We notice that a LinqLookup()
method moved to the second place, while the iterative approach still firmly holds first place.
That said, the simplicity and convenience of the LINQ methods still might be worth the performance cost.
Conclusion
In this article, we analyzed different ways of grouping objects and finding the first record for each group. Besides the core LINQ-based methods, we analyzed an iterative method to achieve the same task. Finally, we benchmarked the performance of each method. While the LINQ-based methods provide a simple, easy-to-understand way of grouping data, if performance is paramount in our use case, it may be worth considering an iterative approach.