In this article, we’ll look at source-generated RegEx and how it can improve performance in our .NET applications.
Let’s dive in!
How Does RegEx Work?
Regular expressions are vital to the programming world, but do we know how they work in .NET?
Let’s examine the example:
public static class PasswordValidator { public static bool ValidatePasswordWithRegularRegEx(string password) { var regex = new Regex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$"); return regex.IsMatch(password); } }
Inside the ValidatePasswordWithRegularRegEx()
method, we create a new instance of the Regex
class, by passing a pattern to the constructor. We can use this pattern to validate if a password is at least 8 characters long. It also checks whether the password has at least one: lower-case letter, an upper-case letter, a digit, and a special character.
A few things happen when we use either the constructor or one of the static methods on the Regex
class. First, the compiler parses the pattern we pass to ensure its validity. Then it turns the pattern into a node tree representation of that pattern. Next, the tree is transformed into a set of instructions that the internal RegexInterpreter
engine can interpret. Finally, when we try to match something against that pattern, the internal regular expressions interpreter goes over those instructions and compares them to the input.
How Does Compiled RegEx Work?
With .NET, we have the option to use a compiled RegEx as well:
public static bool ValidatePasswordWithCompiledRegEx(string password) => Regex.IsMatch( password, @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$", RegexOptions.Compiled);
We create ValidatePasswordWithCompiledRegEx()
method and use the static IsMatch()
method to return a result. To the method, we pass the password we wish to validate, the pattern, and using the RegexOptions.Compiled
enumeration value, we specified that our application must use compiled regular expressions.
When we do this, all of the steps up to the generation of instructions for the RegexInterpreter
engine will be the same. But then, the compiler will further process those instructions and turn them first into IL instructions and then into several DynamicMethod
instances. So, when we try to perform a match, the compiler won’t use the interpreter but will execute those DynamicMethod
instances. This makes matching input faster but more costly as we need to perform additional operations.
What is Source-Generated RegEx in .NET?
With .NET 7, we got a new way of using regular expressions:
public static partial class PasswordValidator { public static bool ValidatePasswordWithSourceGeneratedRegEx(string password) => PasswordRegEx().IsMatch(password); [GeneratedRegex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$")] private static partial Regex PasswordRegEx(); }
We create the partial PasswordRegEx()
method that returns a Regex
instance. Next, we decorate the method with the GeneratedRegex
attribute. Then, we also mark our class as partial
as well. Finally, we create the ValidatePasswordWithSourceGeneratedRegEx()
method and return the result of the IsMatch()
method when called on the result, that we get from the PasswordRegEx()
method.
When we use the GeneratedRegex
attribute on a partial
method that returns a Regex
instance, the internal source generator recognizes this and provides all the necessary logic behind the scenes.
How Does Source-Generated RegEx Improve Performance in .NET?
With source-generated code, we get the option to examine it:
/// <summary>Cached, thread-safe singleton instance.</summary> internal static readonly PasswordRegEx_0 Instance = new(); /// <summary>Initializes the instance.</summary> private PasswordRegEx_0() { base.pattern = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[^\\da-zA-Z]).{8,}$"; base.roptions = RegexOptions.None; ValidateMatchTimeout(Utilities.s_defaultTimeout); base.internalMatchTimeout = Utilities.s_defaultTimeout; base.factory = new RunnerFactory(); base.capsize = 1; }
The first thing that sticks out is the fact that we now get an internal thread-safe instance that is also cached. The source generator doesn’t just initialize a new Regex
instance as well. It produces a code that is in many ways similar to the one that the compiler produces when we use RegexOptions.Compiled
. We get all the benefits of a compiled regular expression as well as some start-up-related benefits as well.
Moreover, the code is very similar to the DynamicMethod
instances produced by the compiler. The compiler is then responsible for translating the generated code to an IL one, which can bring further optimations and performance improvements. For example, if the generated code produces a switch
statement, the compiler then has a myriad of ways to improve it in the produced IL code. This is something we don’t get when we instantiate or use the static Regex
methods.
Measuring Performance Improvement of Source-Generated RegEx
Let’s start by installing the BenchmarkDotnet package:
dotnet add package BenchmarkDotnet
Then, let’s create our benchmarking class:
[MemoryDiagnoser(true)] [Config(typeof(StyleConfig))] public class RegexBenchmarks { private const string Password = "c0d3-MaZ3-Pa55w0rd"; [Benchmark(Baseline = true)] public void RegularRegex() => PasswordValidator.ValidatePasswordWithRegularRegEx(Password); [Benchmark] public void CompiledRegex() => PasswordValidator.ValidatePasswordWithCompiledRegEx(Password); [Benchmark] public void SourceGeneratedRegex() => PasswordValidator.ValidatePasswordWithSourceGeneratedRegEx(Password); private class StyleConfig : ManualConfig { public StyleConfig() => SummaryStyle = SummaryStyle.Default.WithRatioStyle(RatioStyle.Trend); } }
First, we create the RegexBenchmarks
class. Then, we decorate it with the MemoryDiagnoser
and Config
attributes. With the former one we will measure memory allocation and the latter one will add a column showing us the ratio of improvement compared to the RegularRegex()
use.
One final thing left to do:
BenchmarkRunner.Run<RegexBenchmarks>();
In our Program
class, we register the benchmarking class and set our application in Release mode.
BenchmarkDotNet
, check out our article Introduction to Benchmarking in C# and ASP.NET Core Projects.Next, let’s run the benchmark:
| Method | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Allocated | |--------------------- |------------:|----------:|-----------:|--------------:|-------:|-------:|----------:| | RegularRegex | 3,951.07 ns | 78.745 ns | 135.831 ns | baseline | 0.9918 | 0.0153 | 6288 B | | CompiledRegex | 85.42 ns | 0.263 ns | 0.233 ns | 47.45x faster | - | - | - | | SourceGeneratedRegex | 71.34 ns | 0.358 ns | 0.299 ns | 56.71x faster | - | - | - |
When we look at the results, we see that the method using a new Regex
instance has an execution time of 3,951.07 nanoseconds, which is also our baseline. It allocates 6288 bytes of memory and is the slowest of the three.
In second place, we have the compiled RegEx method with a run time of 85.42 nanoseconds, which is 47.45 times faster than our baseline method and no memory allocation.
Finally, we have the source-generated one, which is about 14 nanoseconds faster than the compiled one. With it, we also see a RegEx performance increase 56 times over our baseline.
Conclusion
In this article, we explored the utilization of source-generated RegEx in .NET applications for improved performance. When comparing traditional RegEx initialization with compiled RegEx and source-generated RegEx, we see a significant performance difference. The compiled RegEx method exhibits a notable speed boost, being approximately 47 times faster than traditional RegEx use. However, the source-generated RegEx method surpasses even this, showcasing a remarkable speed improvement of about 5600%. These findings underscore the potential for performance increase of source-generated RegEx in optimizing the performance of our .NET applications.