In this article, we’ll look at source-generated RegEx and how it can improve performance in our .NET applications.

To download the source code for this article, you can visit our GitHub repository.

Let’s dive in!

How Does RegEx Work?

Regular expressions are vital to the programming world, but do we know how they work in .NET?

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Let’s examine the example:

public static class PasswordValidator
{
    public static bool ValidatePasswordWithRegularRegEx(string password)
    {
        var regex = new Regex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$");

        return regex.IsMatch(password);
    }
}

Inside the ValidatePasswordWithRegularRegEx() method, we create a new instance of the Regex class, by passing a pattern to the constructor. We can use this pattern to validate if a password is at least 8 characters long. It also checks whether the password has at least one: lower-case letter, an upper-case letter, a digit, and a special character. 

A few things happen when we use either the constructor or one of the static methods on the Regex class. First, the compiler parses the pattern we pass to ensure its validity. Then it turns the pattern into a node tree representation of that pattern. Next, the tree is transformed into a set of instructions that the internal RegexInterpreter engine can interpret. Finally, when we try to match something against that pattern, the internal regular expressions interpreter goes over those instructions and compares them to the input.

If you want to know more about the basics of regular expressions, pay a visit to our article Introduction to Regular Expressions in C#.

How Does Compiled RegEx Work?

With .NET, we have the option to use a compiled RegEx as well:

public static bool ValidatePasswordWithCompiledRegEx(string password)
    => Regex.IsMatch(
        password,
        @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$",
        RegexOptions.Compiled);

We create ValidatePasswordWithCompiledRegEx() method and use the static IsMatch() method to return a result. To the method, we pass the password we wish to validate, the pattern, and using the RegexOptions.Compiled enumeration value, we specified that our application must use compiled regular expressions.

When we do this, all of the steps up to the generation of instructions for the RegexInterpreter engine will be the same. But then, the compiler will further process those instructions and turn them first into IL instructions and then into several DynamicMethod instances. So, when we try to perform a match, the compiler won’t use the interpreter but will execute those DynamicMethod instances. This makes matching input faster but more costly as we need to perform additional operations.

What is Source-Generated RegEx in .NET?

With .NET 7, we got a new way of using regular expressions:

public static partial class PasswordValidator
{   
    public static bool ValidatePasswordWithSourceGeneratedRegEx(string password)
        => PasswordRegEx().IsMatch(password);

    [GeneratedRegex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$")]
    private static partial Regex PasswordRegEx();
}

We create the partial PasswordRegEx() method that returns a Regex instance.  Next, we decorate the method with the GeneratedRegex attribute. Then, we also mark our class as partial as well. Finally, we create the ValidatePasswordWithSourceGeneratedRegEx() method and return the result of the IsMatch() method when called on the result, that we get from the PasswordRegEx() method.

When we use the GeneratedRegex attribute on a partial method that returns a Regex instance, the internal source generator recognizes this and provides all the necessary logic behind the scenes.

If you want to know more about source generators and how they work you can check out our article Source Generators in C#.

How Does Source-Generated RegEx Improve Performance in .NET?

With source-generated code, we get the option to examine it:

/// <summary>Cached, thread-safe singleton instance.</summary>
internal static readonly PasswordRegEx_0 Instance = new();
    
/// <summary>Initializes the instance.</summary>
private PasswordRegEx_0()
{
    base.pattern = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[^\\da-zA-Z]).{8,}$";
    base.roptions = RegexOptions.None;
    ValidateMatchTimeout(Utilities.s_defaultTimeout);
    base.internalMatchTimeout = Utilities.s_defaultTimeout;
    base.factory = new RunnerFactory();
    base.capsize = 1;
}

The first thing that sticks out is the fact that we now get an internal thread-safe instance that is also cached. The source generator doesn’t just initialize a new Regex instance as well. It produces a code that is in many ways similar to the one that the compiler produces when we use RegexOptions.Compiled. We get all the benefits of a compiled regular expression as well as some start-up-related benefits as well.

Moreover, the code is very similar to the DynamicMethod instances produced by the compiler. The compiler is then responsible for translating the generated code to an IL one, which can bring further optimations and performance improvements. For example, if the generated code produces a switch statement, the compiler then has a myriad of ways to improve it in the produced IL code. This is something we don’t get when we instantiate or use the static Regex methods.

Measuring Performance Improvement of Source-Generated RegEx

Let’s start by installing the BenchmarkDotnet package:

dotnet add package BenchmarkDotnet

Then, let’s create our benchmarking class:

[MemoryDiagnoser(true)]
[Config(typeof(StyleConfig))]
public class RegexBenchmarks
{
    private const string Password = "c0d3-MaZ3-Pa55w0rd";

    [Benchmark(Baseline = true)]
    public void RegularRegex()
        => PasswordValidator.ValidatePasswordWithRegularRegEx(Password);

    [Benchmark]
    public void CompiledRegex()
        => PasswordValidator.ValidatePasswordWithCompiledRegEx(Password);

    [Benchmark]
    public void SourceGeneratedRegex()
        => PasswordValidator.ValidatePasswordWithSourceGeneratedRegEx(Password);

    private class StyleConfig : ManualConfig
    {
        public StyleConfig()
            => SummaryStyle = SummaryStyle.Default.WithRatioStyle(RatioStyle.Trend);
    }
}

First, we create the RegexBenchmarks class. Then, we decorate it with the MemoryDiagnoser and Config attributes. With the former one we will measure memory allocation and the latter one will add a column showing us the ratio of improvement compared to the RegularRegex() use.

One final thing left to do:

BenchmarkRunner.Run<RegexBenchmarks>();

In our Program class, we register the benchmarking class and set our application in Release mode.

For a detailed look at using BenchmarkDotNet, check out our article Introduction to Benchmarking in C# and ASP.NET Core Projects.

Next, let’s run the benchmark:

| Method               | Mean        | Error     | StdDev     | Ratio         | Gen0   | Gen1   | Allocated |
|--------------------- |------------:|----------:|-----------:|--------------:|-------:|-------:|----------:|
| RegularRegex         | 3,951.07 ns | 78.745 ns | 135.831 ns |      baseline | 0.9918 | 0.0153 |    6288 B |
| CompiledRegex        |    85.42 ns |  0.263 ns |   0.233 ns | 47.45x faster |      - |      - |         - |
| SourceGeneratedRegex |    71.34 ns |  0.358 ns |   0.299 ns | 56.71x faster |      - |      - |         - |

When we look at the results, we see that the method using a new Regex instance has an execution time of 3,951.07 nanoseconds, which is also our baseline. It allocates 6288 bytes of memory and is the slowest of the three.

In second place, we have the compiled RegEx method with a run time of 85.42 nanoseconds, which is 47.45 times faster than our baseline method and no memory allocation.

Finally, we have the source-generated one, which is about 14 nanoseconds faster than the compiled one. With it, we also see a RegEx performance increase 56 times over our baseline.

Conclusion

In this article, we explored the utilization of source-generated RegEx in .NET applications for improved performance. When comparing traditional RegEx initialization with compiled RegEx and source-generated RegEx, we see a significant performance difference. The compiled RegEx method exhibits a notable speed boost, being approximately 47 times faster than traditional RegEx use. However, the source-generated RegEx method surpasses even this, showcasing a remarkable speed improvement of about 5600%. These findings underscore the potential for performance increase of source-generated RegEx in optimizing the performance of our .NET applications.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!