In this article, we will review multiple ways of performing fast inserts of bulk data into a database using the Entity Framework (EF for short). Entity Framework “classic” was introduced in 2008, was superseded nearly a decade later by EF Core, and continues to add significant feature additions and performance improvements today. Because of the wide adherence to both older and newer EF versions, we’ll study insert performance across some of the more popular framework combinations.
After that, we will review some engineering needs which can indicate one approach over another for achieving the fastest inserts.
And lastly, we will take a look at a library that uses the native bulk insert capabilities of different database providers.
Let’s dive in.
Single Insert vs. Batching
Before reviewing the code, let’s take a short detour to review some code engineering requirements.
To start with, it should be clear that the slowest approach would be to commit an insert transaction for each record separately. It would mean a database round-trip for each saved entity. There are pros to this: an entity is immediately available in the database and can be used for other components and processes. Also, we encounter less risk of losing the updates while the entity stays in memory, prior to being committed to the database.
On the other hand, reducing the number of round trips to the database and the amount of information processed along the way usually means storing more entities in process memory. These entities are not yet committed to the database and are not available for other components or processes. Accordingly, the risk of losing these changes grows.
Comparing the two approaches should give some hints as to our preferred path.
Example Projects
To demonstrate the outcomes of the different approaches discussed below, we will use three distinct projects implemented using the following combinations of EF, .NET, and C#:
- EF 6 / .NET 7 / C# 11
- EF Core 6 / .NET 6 / C# 10
- EF Core 7 / .NET 7 / C# 11
These represent some of the more common versions in use at the time of writing and should make for some good performance comparisons.
Also, to bypass any possible optimizations between .NET and Microsoft’s own homegrown SQL Server database, we will use an independent, open-source PostgreSQL database.
Common Structures and Code
The projects all define a very simple model containing a single entity Person
:
public class Person { public int PersonId { get; set; } public string Name { get; set; } }
In the database, the PersonId
property will be auto-generated on each insert.
For the rest of the methods, we need a list of random names to be inserted into the database. We can generate those as follows:
private List<string> Names { get; set; } private const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; private static readonly Random Random = new Random(); private static string RandomString(int length) { return new string(Enumerable.Repeat(Chars, length) .Select(s => s[Random.Next(s.Length)]) .ToArray()); } private int _batchSize; public int BatchSize { get => _batchSize; set { _batchSize = value; InitNames(); } } private void InitNames() { Names = new List<string>(_batchSize); for (var i = 0; i < _batchSize; i++) { Names.Add(RandomString(20)); } }
Benchmarking With the BenchmarkDotNet Package
We will use a neat BenchmarkDotNet package to come up with a statistically valid average time that it takes for each approach to complete.
We add the RunStrategy
attribute to the class to request that one process with a single warmup is run, followed by ten iterations of the benchmarking methods.
Accordingly, each method of the TestModel
class in each project is marked to produce a benchmark. And the BatchSize
property is utilized to generate benchmarks for a few different batch sizes:
[SimpleJob(RunStrategy.ColdStart, launchCount:1, warmupCount:1, iterationCount:10)] public class TestModel { ... [Params(100, 1000, 3000)] public int BatchSize { ... } [Benchmark] public void AddOneByOneWithSave() { ... } ... }
As a result, we should receive a reasonably accurate average time for the targeted batches to complete.
Test Bulk Insert Methods
Lastly, each of our three projects implements the same four bulk insert methods:
The AddOneByOneWithSave
method calls SaveChanges
method after each Add
method call (code name Add*Save):
[Benchmark] public void AddOneByOneWithSave() { var stopWatch = StartTimer(); foreach (var name in Names) { _modelContext.Add(new Model.Person() { Name = name }); _modelContext.SaveChanges(); } PrintTimeElapsed(stopWatch); }
OurAddOneByOne
method calls SaveChanges
method once after all the individual Add methods are called (code name Adds+Save):
[Benchmark] public void AddOneByOne() { var stopWatch = StartTimer(); foreach (var name in Names) { _modelContext.Add(new Model.Person() { Name = name }); } _modelContext.SaveChanges(); PrintTimeElapsed(stopWatch); }
The AddRange
method uses a single AddRange
method followed by a single SaveChanges
method (code name AddRange+Save)
[Benchmark] public void AddRange() { var stopWatch = StartTimer(); IList batch = new List( Names.Select(n => new Model.Person() { Name = n })); _modelContext.AddRange(batch); _modelContext.SaveChanges(); PrintTimeElapsed(stopWatch); }
And finally, the BulkExtensionBulkInsert
method uses the EFCore.BulkExtensions
package with its BulkInsert
method call (code name BulkInsert):
[Benchmark] public void BulkExtensionBulkInsert() { var stopWatch = StartTimer(); IList batch = new List( Names.Select(n => new Model.Person() { Name = n })); _modelContext.MockableBulkInsert(batch); PrintTimeElapsed(stopWatch); }
Where the MockableBulkInsert
method translates to:
public virtual void MockableBulkInsert(IList<Model.Person> people) { this.BulkInsert(people, c => { c.BatchSize = 1000; }); }
Examining the Logs
Running the three projects using a debugger provides the following insights:
- Add*Save
- EF 6: opens a connection, starts a transaction, inserts, commits, and disconnects for each entity.
- EF Core 6 & EF Core 7: opens a connection, inserts, and disconnects for each entity.
- Add+Save
- EF 6: opens a connection, starts a transaction, inserts, commits, and disconnects for each entity.
- EF Core 6 & EF Core 7: opens a connection, inserts the whole batch, and disconnects.
- AddRange+Save
- EF 6: opens a connection, starts a transaction, inserts for each entity, commits, and disconnects.
- EF Core 6 & EF Core 7: opens a connection, inserts the whole batch, and disconnects.
- BulkInsert
- EF 6 & EF Core 6 & 7: performs optimized batch insert for the chosen database provider.
What we are after is the actual benchmarking to see the differences in the average time it takes for each setup to insert a batch of 100, 1000, and 3000 records into the database, so let’s take a look at those results.
Benchmarking Results
Using the BenchmarkDotNet package to bulk insert 100, 1000, and 3000 entities using different methods generates this average (mean) times:
Batch Size | Method | EF Core 7 .NET 7 C# 11 | EF Core 6 .NET 6 C# 10 | EF 6 .NET 7 C# 11 |
---|---|---|---|---|
100 | AddOneByOneWithSave | 2,054.0 ms | 4,048.1 ms | 4,154.7 ms |
AddOneByOne | 142.2 ms | 156.6 ms | 2,166.1 ms | |
AddRange | 145.0 ms | 147.4 ms | 2,143.0 ms | |
BulkExtensionBulkInsert | 107.3 ms | 118.7 ms | 288.4 ms | |
1000 | AddOneByOneWithSave | 22,731.4 ms | 42,512.3 ms | 41,646.8 ms |
AddOneByOne | 350.9 ms | 355.7 ms | 21,048.3 ms | |
AddRange | 333.8 ms | 349.6 ms | 20,337.7 ms | |
BulkExtensionBulkInsert | 123.6 ms | 136.3 ms | 372.4 ms | |
3000 | AddOneByOneWithSave | 85,397.4 ms | 145,356.5 ms | 139,716.9 ms |
AddOneByOne | 791.1 ms | 837.1 ms | 68,375.3 ms | |
AddRange | 813.6 ms | 792.5 ms | 59,509.7 ms | |
BulkExtensionBulkInsert | 175.0 ms | 181.4 ms | 522.2 ms |
As we can easily see from the table, in all cases, the performance of EF Core 7 when compiled with C# 11 and run with .NET 7 is better than any previous versions. Also, the numbers confirm the massive improvement contributed by the community when dealing with multiple entity additions followed by a single SaveChanges
call.
This table also shows that if we develop code that receives thousands of entities to insert at once, the BulkExtensions
package is a definite winner.
But if we process order-by-order, it would be easier to call the Add
method to submit the new entity through EF Core immediately. Then we follow with a single SaveChanges
method call based on our batching strategy for delayed commit. This still provides excellent performance due to EF Core’s batching support. In classic EF, on the other hand, it would cause a database round-trip for each addition.
Conclusion
In this article, we’ve reviewed key concepts for handling mass insertion using Entity Framework. In addition, we have also touched on the BulkExtensions package, which can help us to develop highly performant bulk insert code.
Secondly, it is essential to clearly state both our architectural constraints and our goals for optimization before deciding on a final strategy.
Finally, all things change with time, frameworks evolve, and new libraries arise. So it is prudent to keep a watchful eye out for the latest ways of optimizing the performance of your database-facing applications with C# and .NET.
Still dAtatable insert best fast for bhlk insert. 4 second per 5000 rows.
Now I know about BulkExtensions library.
We all learn something new each day in this area. I’m glad this article helped you learn something new.