In this article, we will explore different methods to get the first record in each group with LINQ.

To download the source code for this article, you can visit our GitHub repository.

Grouping, searching, sorting, filtering, and other types of manipulations on object arrays are among the tasks developers do daily. Another of those tasks is finding the first record (or the last) in a grouped list of tasks for each group. 

So, let’s see how we can do it.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Set Up the Environment

First, let’s define our test environment. For our use cases, we will use a list of objects of type Student:

public class Student
{
    public string FirstName { get; set; } = string.Empty;
    public string LastName { get; set; } = string.Empty;
    public DateOnly DateOfBirth { get; set; }
    public int Class { get; set; }

    public override bool Equals(object? obj)
    {
        var student = obj as Student;
        
        if(student == null) return false;

        return student.FirstName.Equals(FirstName) &&
            student.LastName.Equals(LastName) &&
            student.DateOfBirth.Equals(DateOfBirth);
    }
}

Now, let’s define a GenerateStudents() method:

public static List<Student> GenerateStudents(int count = 100_000) =>
    new Faker<Student>()
    .UseSeed(42)
    .RuleFor(m => m.FirstName, faker => faker.Person.FirstName)
    .RuleFor(m => m.LastName, faker => faker.Person.LastName)
    .RuleFor(m => m.DateOfBirth, faker => DateOnly.FromDateTime(faker.Person.DateOfBirth))
    .RuleFor(m => m.Class, faker => faker.PickRandom(Enumerable.Range(1,10)))
    .Generate(count);

Here, we use the library Bogus to define rules and generate our list of Students. We also set a value for the random seed to make our random values repeatable for benchmarking and testing.

Different Methods to Retrieve the First Record in Each Group

We will define and test multiple methods of grouping our Students by the property Class, and finding the youngest student in each class. 

Retrieve the First Record in Each Group Using GroupBy

There are two approaches we can take using the GroupBy() method.

The first method uses GroupBy() along with a Func delegate to group students into classes:

public static List<Student> GetYoungestStudentInClassLinqGroupBy1(this List<Student> students) 
    => students.GroupBy(m => m.Class, (key, g) => g.OrderByDescending(e => e.DateOfBirth).First())
    .ToList();

For the second parameter of GroupBy(), we define a Func to specify which object to be returned:

(key, g) => g.OrderByDescending(e => e.DateOfBirth).First()

In our case, we want the youngest student, so we return the first record in a group sorted by the DateOfBirth property in descending order.

Our second approach using GroupBy() is similar to the first one:

public static List<Student> GetYoungestStudentInClassLinqGroupBy2(this List<Student> students) 
    => students.GroupBy(m => m.Class).Select(g => g.OrderByDescending(e => e.DateOfBirth).First())
    .ToList();

In this case, instead of a Func delegate in the GroupBy() method, we pass a similar Func into the Select() method to retrieve the youngest student from within each group.

As we will see later, both methods have similar performance, so choose whichever one fits your programming style.

Extract the First Record in Each Group Using ToLookup

The next method in the list is the ToLookup():

public static List<Student> GetYoungestStudentInClassLinqLookup(this List<Student> students) 
   => students.ToLookup(m => m.Class).Select(g => g.OrderByDescending(e => e.DateOfBirth).First())
   .ToList();

The ToLookup() method produces a one-to-many dictionary that maps the key to a collection of values.  After that, we use the same Select() query against the dictionary as in the previous method.

Using ToDictionary To Retrieve the First Record

For our final LINQ-based method, we use ToDictionary():

public static List<Student> GetYoungestStudentInClassLinqDictionary(this List<Student> students) 
  => students.Select(m => m.Class).Distinct()
    .ToDictionary(m => m, 
      m => students
      .Where(s => s.Class.Equals(m))
      .OrderByDescending(s => s.DateOfBirth).First())
    .Values
    .ToList();

Here, first, we need to prevent duplicate keys. For this, we select the Distinct() values of the Class property:

students.Select(m => m.Class).Distinct()

Now, we use the ToDictionary() method to generate a dictionary, with the Class used as a key. The Func delegate on lines 5-7 mandates that each group contain only the youngest student for a given Class property value.

Finally, we return a collection with only the values of the generated Dictionary.

Iterate and Group Using Dictionary

While the focus of the article is LINQ methods, for reference, let’s define a method using a non-LINQ, iterative approach:

public static List<Student> GetYoungestStudentInClassIterativeDictionary(this List<Student> students)
{
    var groupedStudents = new Dictionary<int, Student>();
    foreach (var student in students)
    {
        if (!groupedStudents.TryGetValue(student.Class, out Student existingStudent) ||
            student.DateOfBirth > existingStudent.DateOfBirth)
        {
            groupedStudents[student.Class] = student;
        }
    }

    return groupedStudents.Values.ToList();
}

Here, we first define the Dictionary object to store key-value pairs. In our case, the key Class is an integer, and the values are, of course, of type Student.

Next, we iterate through the list and add or update the youngest student for each Class. 

As before, we return the list of our students stored in the Values property of the dictionary.

Benchmarking First Record Method Performance

Let’s see how these methods perform on an array with 100,000 records. For this, we will use the BenchmarkDotNet library. Let’s define our PerformanceBenchmark class:

[Orderer(SummaryOrderPolicy.FastestToSlowest)]
[HideColumns(new string[] { "Job", "Error", "StdDev", "Median" })]
[MemoryDiagnoser(false)]
[RankColumn]
public class PerformanceBenchmark 
{
    private static readonly List<Student> _students = 
        Methods.GenerateStudents();

    [Benchmark]
    public List<Student> LinqGroupBy1() => 
        _students.GetYoungestStudentInClassLinqGroupBy1();

    [Benchmark]
    public List<Student> LinqGroupBy2() => 
        _students.GetYoungestStudentInClassLinqGroupBy2();

    [Benchmark]
    public List<Student> LinqLookup() =>
       _students.GetYoungestStudentInClassLinqLookup();

    [Benchmark]
    public List<Student> LinqDictionary() =>
       _students.GetYoungestStudentInClassLinqDictionary();

    [Benchmark]
    public List<Student> IterativeDictionary() => 
        _students.GetYoungestStudentInClassIterativeDictionary();
}

To run the benchmark test, we use:

BenchmarkRunner.Run<PerformanceBenchmark>();

Finally, we get the results for the 100,000 records:

|              Method |      Mean | Rank |  Allocated |
|-------------------- |----------:|-----:|-----------:|
| IterativeDictionary |  1.132 ms |    1 |    1.13 KB |
|        LinqGroupBy1 |  5.346 ms |    2 | 3347.57 KB |
|          LinqLookup |  5.632 ms |    3 |  2566.4 KB |
|        LinqGroupBy2 |  5.633 ms |    3 | 2566.43 KB |
|      LinqDictionary | 12.939 ms |    4 |    4.41 KB |

While LINQ-based methods provide simplicity and, in one way, a clear grouping logic, it’s worth noting that the best overall performance is offered by the simple for loop.

Also, we notice methods that rely on a Dictionary requires much less memory than the others.

Let’s see the results on 1,000,000 records:

|              Method |      Mean | Rank |  Allocated |
|-------------------- |----------:|-----:|-----------:|
| IterativeDictionary |  15.26 ms |    1 |          - |
|          LinqLookup |  61.06 ms |    2 | 20979768 B |
|        LinqGroupBy1 |  67.99 ms |    2 | 28979680 B |
|        LinqGroupBy2 |  83.61 ms |    3 | 20979808 B |
|      LinqDictionary | 178.67 ms |    4 |          - |

We notice that a LinqLookup() method moved to the second place, while the iterative approach still firmly holds first place.

That said, the simplicity and convenience of the LINQ methods still might be worth the performance cost.

Conclusion

In this article, we analyzed different ways of grouping objects and finding the first record for each group. Besides the core LINQ-based methods, we analyzed an iterative method to achieve the same task. Finally, we benchmarked the performance of each method. While the LINQ-based methods provide a simple, easy-to-understand way of grouping data, if performance is paramount in our use case, it may be worth considering an iterative approach.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!