In this article, we are going to discuss how we can join MongoDB collections with .NET using an aggregation pipeline. Since we discussed MongoDB integration in a previous article, we’ll now focus on building the aggregation pipeline in .NET.

To download the source code for this article, you can visit our GitHub repository.

Working with NoSQL databases is often a complex task, especially when dealing with multiple collections. However, MongoDB has made these challenging tasks easier with the aggregation pipelines, which simplify the process of transforming documents through multiple stages. 

Let’s dive in.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Prepare Our Environment To Join MongoDB Collections

To move forward with our example, We have to initialize our environment so that our aggregation pipeline can run smoothly and return our expected data. First, we need to create a MongoDB instance to host our data. Then we need to create our entities that will represent a model for our MongoDB collections. Finally, we need to seed our database with our data.

Spinning Off Our MongoDB Container

Let’s go ahead and create a new console application project and configure it with MongoDB. For this example, will we be using the MongoDB Test Container as our database.

To kick things off, let’s create a new MongoDB container by using the MongoDbBuilder class provided by the TestContainers.MongoDb package:

var mongoDbContainer = new MongoDbBuilder().Build();

Next, let’s start our MongoDb container by calling the StartAsync() method of the mongoDbContainer variable:

await mongoDbContainer.StartAsync();

Now we have a MongoDB container running in docker. Next, we need to create a MongoClient instance so we can use it in our code:

var mongoClient = new MongoClient(mongoDbContainer.GetConnectionString());

We call the GetConnectionString() method on the mongoDbContainer variable to get the connection string of the MongoDB service and pass it as an argument to the MongoClient class.

Subsequently, we can now initialize our database by using the GetDatabase() method in the MongoClient class:

var database = mongoClient.GetDatabase(DatabaseConfiguration.DatabaseName);

Creating Our Entities

Our next step is to create our entities. We will need two entities, a Course class:

public class Course
{
    [BsonElement("_id")]
    [BsonRepresentation(BsonType.ObjectId)]
    public string Id { get; set; } = string.Empty;

    [Required]
    [BsonElement("Name")]
    public string Name { get; set; } = string.Empty;

    [Required] 
    [BsonElement("Code")] 
    public string Code{ get; set; } = string.Empty;

    public override bool Equals(object? obj)
    {
        if (obj is not Course course) return false;
        return Name == course.Name 
               && Code == course.Code;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(Id, FirstName, LastName, Major, StudentCourses);
    }
}

And a Student class:

public class Student
{
    [BsonElement("_id")]
    [BsonRepresentation(BsonType.ObjectId)]
    public string Id { get; set; } = string.Empty;

    [Required]
    [BsonElement("FirstName")]
    public string FirstName { get; set; } = string.Empty;

    [Required]
    [BsonElement("LastName")]
    public string LastName { get; set; } = string.Empty;

    [Required] 
    [BsonElement("LastName")] 
    public string Major { get; set; } = string.Empty;

    [BsonElement("Courses")] 
    public List<Course> StudentCourses { get; set; } = [];

    public override bool Equals(object? obj)
    {
        if (obj is not Student student) return false;
        return FirstName == student.FirstName
               && LastName == student.LastName
                && StudentCourses.All(course => course
                    .Equals(student.StudentCourses.ElementAt(StudentCourses.IndexOf(course)))); 
    }


    public override int GetHashCode()
    {
        return HashCode.Combine(Id, Name, Code);
    }
}

We add the StudentCourses property as a list of the Course type in the Student class instead of the ObjectId list. We do that because we need to pull this property from the Course collection in the aggregation pipeline.

Seeding Our Database

Our final step is to seed our database

var courseCollection = database.GetCollection<BsonDocument>("Courses");
var studentCollection = database.GetCollection<BsonDocument>("Students");
await courseCollection.InsertManyAsync(new List<BsonDocument>
{
    new()
    {
        { "_id", new ObjectId("655e134180c300fcdd067d24") } ,
        { "Name", "Networks" },
        { "Code", "ECEN 474" }
    },
    new()
    {
        { "_id", new ObjectId("655e134180c300fcdd067d25") } ,
        { "Name", "Power Systems" },
        { "Code", "ECEN 485" }
    }
});

await studentCollection.InsertManyAsync(new List<BsonDocument>
{
    new()
    {
        { "_id", new ObjectId("656623db682962fa62ad75ba") } ,
        { "FirstName", "John" },
        { "LastName", "Doe" },
        { "Major", "Electrical Engineering" },
        { "Courses", new BsonArray {
            new ObjectId("655e134180c300fcdd067d24"),
            new ObjectId("655e134180c300fcdd067d25")
        } }
    }
});

We use the InsertManyAsync() method on the Course and Student collections and pass a list of entities as arguments to populate our collections with data.

We are now ready to start building our aggregation pipeline.

Building the Aggregation Pipeline To Join MongoDB Collections

Now that we’ve initialized our environment, we can start building our aggregation pipeline:

public class StudentRepository
{
    private readonly IMongoCollection<Student> _studentCollection;

    public StudentRepository()
    {
        var client = new MongoClient(DatabaseConfiguration.ConnectionString);
        var database = client.GetDatabase(DatabaseConfiguration.DatabaseName);
        _studentCollection = database.GetCollection<Student>("Students");
    }
}

First, we create a StudentRepository class and in the constructor of the class, we initialize our collections. Now, let’s see how we can create our aggregation pipeline.

The pipeline will consist of two stages: the $lookup and $project stages provided by MongoDB. Each stage will help in transforming our data until it matches our model.

We can start by creating an empty aggregation pipeline on our Students collection by using the Aggregate() method on the _studentCollection field:

var studentAggregationPipeline = _studentCollection.Aggregate();

With this, we can continue joining our course collection with the student collection.

Join MongoDB Collections With $lookup

$lookup is an aggregation stage that performs an outer left join between two collections.

In our case, we can use it to create an outer left join between our Students and Courses collections. We can achieve this by using the Lookup() method of the studentAggregationPipeline we created earlier.

This method accepts four arguments, foreign collection name, local field name, foreign primary key, and the desired field name:

var studentsJoinedWithCoursesPipeline = studentsWithoutCourseIdsPipeline
    .Lookup<Student, Student>("Courses", "Courses", "_id", "StudentCourses");

In our example, our foreign collection is the Courses collection, our local field name is Courses, our foreign primary key is _id, and the field we need to create is Courses.

Projection With $project

Our second stage in the pipeline is $project which we can use to include, exclude, or transform certain fields in our documents.

The purpose of this stage in our example is to exclude the Courses field from our Student documents since it will be replaced by the whole course document from the Courses collection:

var projection = Builders<Student>.Projection.Exclude("Courses");
var studentsWithoutCourseIdsPipeline = studentAggregationPipeline.Project<Student>(projection);

As we can see, we use the Exclude() method on the Builders<BsonDocument>.Projection property to exclude the Courses field from the result. Then, we call the Project() method on the studentAggregationPipeline variable and pass our projection as an argument.

Let’s put that all together:

public class StudentsRepository
{
    public async Task<List<Student>> GetAllUsers()
    {
        //Empty Pipeline
        var studentAggregationPipeline = _studentCollection.Aggregate();
    
        //Lookup
        var studentsJoinedWithCoursesPipeline = studentAggregationPipeline
           .Lookup<Student, Student>("Courses", "Courses", "_id", "StudentCourses");

        //Project
        var projection = Builders<Student>.Projection
            .Exclude("Courses");

        var studentsWithoutCourseIdsPipeline = studentsJoinedWithCoursesPipeline.Project<Student>(projection);

        var students = await studentsWithoutCourseIdsPipeline.ToListAsync();

        return students;
    }
}

Finally, we convert our result as a list using the ToListAsync() method on our setup pipeline studentsWithoutCourseIdsPipeline.

Now let’s use that code to get our students with courses:

var users = await repository.GetAllUsers();
foreach (var user in users)
{
    Console.WriteLine(user.ToJson());
}

By iterating through our users and printing them as JSON string with the ToJson() method to our console output, we can see that we have successfully joined the Students and Courses collections into a single result that is similar to our model class:

{ 
    "_id" : ObjectId("656623db682962fa62ad75ba"), 
    "FirstName" : "John", 
    "LastName" : "Doe", 
    "Major" : "Electrical Engineering", 
    "StudentCourses" : [
        { 
            "_id" : ObjectId("655e134180c300fcdd067d24"), 
            "Name" : "Networks", 
            "Code" : "ECEN 474" 
        }, 
        { 
            "_id" : ObjectId("655e134180c300fcdd067d25"), 
            "Name" : "Power Systems", 
            "Code" : "ECEN 485" 
        }
    ] 
}

In this result, we can see that the courses with object identifiers ObjectId("655e134180c300fcdd067d24") and ObjectId("655e134180c300fcdd067d25") from the Courses collection have been joined with the Student document from the Students collection.

Conclusion

Whether for small applications or enterprise applications, MongoDB is proving to be efficient and performant on all levels. With the great power that can be leveraged using the aggregation pipelines provided by MongoDB, there seems to be no limit to what can be achieved using NoSQL databases.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!