This article will explore the quickest method to find and extract a number from a string in C#. We will discuss and implement different techniques for accomplishing this task. Subsequently, we will evaluate the performance of these techniques using the BenchmarkDotNet library.

To download the source code for this article, you can visit our GitHub repository.

Let’s dive in.

Find and Extract a Number From a String Using Regular Expressions

Regular expressions are a powerful tool for searching and parsing text. With the introduction of .NET 7 the ability to source generate regular expression was added allowing for the benefit of compiled regular expressions without the runtime penalty. Since our focus in this article is performance, we will take advantage of the source generator to improve our regex method performance:

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!
[GeneratedRegex(@"-?\d+(\.\d+)?")]
private static partial Regex NumberRegex();

public static string ExtractNumberUsingRegEx(string inputString)
{
    var extractedNumbers = new List<double>();

    foreach (Match match in NumberRegex().Matches(inputString))
    {
        if (double.TryParse(match.Value, out var parsedNumber))
        {
            extractedNumbers.Add(parsedNumber);
        }
    }

    return string.Join(",", extractedNumbers);
}

Here, the [GeneratedRegex] attribute indicates that the NumberRegex method has been generated using source generation, specifically designed to identify decimal numbers, complete with an optional minus sign (-?\d+(.\d+)?). Subsequently, we use the Regex.Matches() method to locate all matches of the pattern within the input string, resulting in a collection of matches. We attempt to parse each match in the MatchCollection, returning a concatenation of any valid numbers found.

Using LINQ to Find and Extract a Number From a String

Another approach to extracting numbers from a string involves the char.IsBetween() method in conjunction with LINQ (Language Integrated Query). Instead of iterating through each character in the input string and checking if it’s a digit, we can utilize LINQ and the char.IsBetween() method to achieve the same result more efficiently.

This lets us create a more streamlined approach to filtering out non-numeric characters from the input string:

public static string ExtractNumbersUsingLinq(string inputString)
{    
    return string.Join(",", new string(inputString
          .Where(c => char.IsBetween(c, '0', '9') || c == '.' || c == '-' || char.IsWhiteSpace(c))
          .ToArray()).Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries));  
}

We use LINQ to filter out digits by employing the Where extension method with the char.IsBetween predicate. After filtering, we convert the digits back into a string. This approach eliminates manual iteration over each character, resulting in cleaner and more concise code.

Please be aware that while the char.IsDigit() method is an option, we are opting not to use it for several reasons:

Initially, it verifies if a character belongs to the ASCII or extended ASCII code sets. Following that, it assesses whether the character falls within the 0 – 9 numeric range. Should the character fall outside this range, the method then evaluates a wide array of other Unicode characters deemed as valid digits, which might introduce unexpected complications in applications not equipped to handle such diversity.  

Finding and Extracting a Number Using StringBuilder

To achieve better performance, especially when handling large strings or a significant number of string operations, we can utilize StringBuilder. This approach minimizes memory overhead and enhances processing speed.

First, let’s define an AddNumberToList() method that attempts to parse a number and, if it’s valid, adds it to a list:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void AddNumberToList(ReadOnlySpan<char> numberSpan, List<double> numbers)
{
    if (double.TryParse(numberSpan, NumberStyles.Any, CultureInfo.InvariantCulture, out var number))
    {
        numbers.Add(number);
    }
}

Then let’s create a method that will search for numbers within a given string and extract them using StringBuilder:

public static string ExtractNumberUsingStringBuilder(string inputString)
{
    var numbers = new List<double>();
    var currentNumber = new StringBuilder();
    var isInsideNumber = false;

    foreach (var c in inputString)
    {
        if (char.IsBetween(c, '0', '9') || c == '.' || c == '-')
        {
            currentNumber.Append(c);
            isInsideNumber = true;
        }
        else if (isInsideNumber)
        {
            AddNumberToList(currentNumber.ToString(), numbers);
            currentNumber.Clear();
            isInsideNumber = false;
        }
    }

    if (currentNumber.Length > 0)
    {
        AddNumberToList(currentNumber.ToString(), numbers);
    }

    return string.Join(",", numbers);
}

Each character is examined in the loop to determine if it’s a digit using char.IsBetween(). If a character qualifies as a digit, we append it to the StringBuilder. When we reach a character that does not represent a valid number, we add the previously constructed number stored in the StringBuilder to the numbers list.  Finally, the list of extracted numbers is concatenated and returned.

Using Span and SearchValues to Find and Extract a Number From a String

A modern approach to string parsing involves utilizing Span for improved performance and memory efficiency. Span allows for direct access to the underlying memory of a string without additional allocations.

First, we’ll define a SearchValues<char> of valid numerical characters including digits, minus sign, and decimal point:

private static readonly SearchValues<char> NumericSearchValues = SearchValues.Create(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '.']);

SearchValues<T>, which were added in .NET 8, are specifically optimized for use in searching, and so we utilize them here for searching our input string:

public static string ExtractNumberUsingSpan(string inputString)
{
    var numbers = new List<double>();

    var inputStringSpan = inputString.AsSpan();
    while (true)
    {
        var startIndex = inputStringSpan.IndexOfAny(NumericSearchValues);
        if (startIndex == -1)
            break;

        inputStringSpan = inputStringSpan[startIndex..];

        var endIndex = inputStringSpan.IndexOfAnyExcept(NumericSearchValues);
        if (endIndex == -1)
        {
            AddNumberToList(inputStringSpan, numbers);
            break;
        }

        AddNumberToList(inputStringSpan[..endIndex], numbers);
        inputStringSpan = inputStringSpan[endIndex..];
    }

    return string.Join(",", numbers);
}

The method uses ReadOnlySpan<char> to efficiently iterate through the input string, identifying and extracting numbers. It leverages the IndexOfAny() and IndexOfAnyExcept() methods to search for valid numerical values and characters that are not valid numerical values, respectively, within the span. Throughout the operation, we repeatedly slice the span both to extract the numbers, as well as to reduce our search space on our next iteration. All extracted numbers are added to a list, which lastly, we concatenate and return.

Performance Comparison

Let’s use BenchmarkDotNet to benchmark and compare the different methods discussed. Benchmarking empowers developers to choose the most suitable method for a given scenario, balancing readability and performance:

| Method                                | Mean     | Error    | StdDev   | Median   | Ratio | Gen0   | Allocated |
|-------------------------------------- |---------:|---------:|---------:|---------:|------:|-------:|----------:|-
| ExtractNumberUsingLinqMethod          | 456.1 ns |  9.16 ns | 23.64 ns | 448.0 ns |  1.00 | 0.3633 |     760 B |
| ExtractNumberUsingSpanMethod          | 508.6 ns |  9.98 ns |  9.80 ns | 505.5 ns |  1.09 | 0.0839 |     176 B |
| ExtractNumberUsingStringBuilderMethod | 594.4 ns | 11.70 ns | 13.93 ns | 588.0 ns |  1.29 | 0.1755 |     368 B |
| ExtractNumberUsingRegExMethod         | 719.8 ns | 14.21 ns | 24.51 ns | 708.2 ns |  1.58 | 0.4358 |     912 B |

From the results, we observe that the ExtractNumberUsingRegExMethod is the slowest in performance, running approximately 1.6 times slower than our baseline ExtractNumberUsingLinqMethod method. The ExtractNumberUsingStringBuilderMethod runs approximately 1.3 times slower than our baseline method. And lastly, our ExtractNumberUsingSpan method is only slightly slower (~1.1x), but has the added advantage of the lowest amount of memory allocation. 

Ease of Use

In addition to performance comparison, another crucial factor is the ease of use for developers. We’ve outlined several key factors from the usability perspective for each method covered above.

Using Regex requires some familiarity with regular expressions, which can be intimidating for beginners. With Source Generators, developers can define Regex patterns and associated parsing methods straightforwardly. The generated code handles the pattern matching and extraction, abstracting away the complexities of Regex implementation.

LINQ is intuitive and easy to understand, making it suitable for developers of all levels. It provides a straightforward approach to filtering numeric characters from a string.

Using StringBuilder and Char.IsBetween requires a basic understanding of string manipulation but is easy to implement. It offers a balance between performance and simplicity, making it suitable for a wide range of scenarios.

Finally, using Span in connection with SearchValues<T> for string parsing offers improved performance and memory efficiency without sacrificing ease of use. It provides direct access to the underlying memory of a string, allowing for efficient pattern matching and extraction.

Conclusion

In this article, we have gained knowledge about how to extract numbers from strings in C# through various methods, each with its own set of advantages. Ultimately, which method we choose must be based on our specific use case and performance requirements, considering both speed and ease of implementation.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!