Categories: C#

Counting Occurrences of a Char Within a String in C#

In this article, we are going to learn how to count occurrences of a char within a string in C#. A string is a sequence of characters and we essentially iterate over it to count the number of occurrences of any character within it.

To download the source code for this article, you can visit our GitHub repository.

Let’s look at various ways to do so.

Using LINQ Count()

We can use System.Linq to count the number of characters using the Count() method.  

Let’s create a method to look at the usage of Count:

public int CountCharsUsingLinqCount(string source, char toFind)
{
    return source.Count(t => t == toFind);
}

The Count() method of Linq iterates over every character of the string and executes the predicate, in our case t => t == toFind for each of them.

On executing the method,  we get the number of occurrences of the character toFind in the string main:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingLinqCount(main, toFind);

Assert.Equal(2, actual);

Using foreach Loop

A simple approach to iterating over all the characters of the given string is to use the foreach loop. This allows us to keep a count of the occurrence of any character. 

Let’s create a method to count how many times a character appears in the string using foreach:

public int CountCharsUsingForeach(string source, char toFind)
{
    int count = 0;

    foreach (var ch in source)
    {
        if (ch == toFind)
            count++;
    }

    return count;
}

We keep a counter variable count and keep increasing its value every time we encounter the desired character in the string.

Thus, this method returns the number of occurrences of toFind in the string main:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingForeach(main, toFind);

Assert.Equal(2, actual);

Using foreach Loop With Span

We can also make use of Span<T> inside a foreach loop to count the character occurrences. The string class has a built-in method that returns a Span<Char>. Span<T> performs better than the string class because it is always allocated on the stack. The garbage collector does not have to suspend execution to clean up objects on the heap and hence often, the application runs faster.

Here, the only change from the previous approach is that we apply the AsSpan() method on the source string:

public int CountCharsUsingForeachSpan(string source, char toFind)
{
    int count = 0;

    foreach (var c in source.AsSpan())
    {
        if (c == toFind)
            count++;
    }

    return count;
}

The method returns how many times toFind occurred in the string variable main:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingForeachSpan(main, toFind);

Assert.Equal(2, actual);

Using For Loop

Another approach to iterating over a string is to use for loop. This works in a similar manner to foreach where we count the number of times a character occurs in the source string:

public int CountCharsUsingFor(string source, char toFind)
{
    int count = 0;

    for (int n = 0; n < source.Length; n++)
    {
        if (source[n] == toFind)
            count++;
    }

    return count;
}

We can also use a char[] instead of a string for some performance advantage. The call to the ToCharArray() method is an inexpensive one as it’s a native call.

This method returns the count of the character toFind in the string main on execution:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingFor(main, toFind);

Assert.Equal(2, actual);

Using IndexOf() Method

A way to count the occurrence of a character within a string is using the IndexOf() method of the string class.

We keep a counter variable and increment it every time the statement mainString.IndexOf(toFind, n) + 1) returns a value greater than 0. i.e. the character exists in the string:

public int CountCharsUsingIndex(string source, char toFind)
{
    int count = 0;
    int n = 0;

    while ((n = source.IndexOf(toFind, n) + 1) != 0)
    {
        n++;
        count++;
    }

    return count;
}

Similar to the previous methods, on executing the method we get the number of occurrences of the character toFind in the string main:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingIndex(main, toFind);

Assert.Equal(2, actual);

Using Split() Method

We can count the number of characters using the Split() method:

public int CountCharsUsingSplit(string source, char toFind)
{
    return source.Split(toFind).Length - 1;
}

In this example, we split the main string using the character as a delimiter. This results in an array of strings whose length is 1 more than the substring occurrence.

We can execute this method to find the occurrence of the character:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingSplit(main, toFind);

Assert.Equal(2, actual);

Using String Replace() Method

Let’s create a method to count the number of occurrences of a character using the Replace() method:

public int CountCharsUsingReplace(string source, char toFind)
{
    return source.Length - source.Replace(toFind.ToString(), "").Length;
}

Here, we convert the character to a string and replace it with an empty string. We then find the difference between the length of the original string and the resulting string.

On executing the method, we get the number of occurrences of a character:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingReplace(main, toFind);

Assert.Equal(2, actual);

Using Regex Pattern Matching

We can also use Regex pattern matching to count the number of occurrences of a character.

Let’s create a method to do so:

public int CountCharsUsingRegex(string source, char toFind)
{
    return new Regex(Regex.Escape(toFind.ToString())).Matches(source).Count;
}

We can then execute this method to count the number of a character within a string:

string main = "Mary Had A Little Lamb";
char toFind = 'L';

int actual = _countChars.CountCharsUsingRegex(main, toFind);

Assert.Equal(2, actual);

Use Count From the Community.Toolkit.HighPerformance Package

If we install the mentioned package:Install-Package CommunityToolkit.HighPerformance, we can find a lot of different extension methods to improve the performance of our code. One of those methods is the Count method that extends the ReadOnlySpan<T> type:

public int CountCharsUsingSpanCount(string source, char toFind)
{
    return source.AsSpan().Count(toFind);
}

This method improves the performance of our code, as we will see in the performance test.

We have to give thanks to our reader Joel for mentioning this in the comment section.

Performance Comparison

Now that we have looked at various methods to count the occurrences of characters within a string, let’s see how they perform against each other.

We’ll be using BenchmarkDotNet to run the performance benchmarks.

First, let’s create a method GenerateStringWithCharArgs():

public IEnumerable<object[]> GenerateStringWithCharArgs()
{
    yield return new object[] { "Mary had a little lamb", 'l' };
}

This method will help us run performance tests on all the methods mentioned above that count the occurrences of a character in a string.

To performance test the methods with BenchmarkDotNet, we must mark the methods using the attributes BenchMark and ArgumentsSource:

[Benchmark]
[ArgumentsSource(nameof(GenerateStringWithCharArgs))]
public int CountCharsUsingLinqCount(string source, char toFind)

The attribute ArgumentsSource takes the name of the public method that is going to provide the values.

Let’s assess the performance results:

|                             Method |        Mean |     Error |    StdDev |   Gen0 | Allocated |
|----------------------------------- |------------:|----------:|----------:|-------:|----------:|
|           CountCharsUsingSpanCount |    11.89 ns |  0.132 ns |  0.110 ns |      - |         - |
|         CountCharsUsingForeachSpan |    15.11 ns |  0.331 ns |  0.524 ns |      - |         - |
|                 CountCharsUsingFor |    15.84 ns |  0.351 ns |  0.888 ns |      - |         - |
| CountCharsUsingForReverseIteration |    16.14 ns |  0.174 ns |  0.145 ns |      - |         - |
|             CountCharsUsingForeach |    17.51 ns |  0.384 ns |  0.768 ns |      - |         - |
|         CountCharsUsingForWithSpan |    22.59 ns |  0.354 ns |  0.331 ns |      - |         - |
|               CountCharsUsingIndex |    29.22 ns |  0.572 ns |  0.535 ns |      - |         - |
|             CountCharsUsingReplace |    86.07 ns |  0.999 ns |  0.934 ns | 0.0210 |      88 B |
|               CountCharsUsingSplit |    87.02 ns |  1.646 ns |  1.459 ns | 0.0478 |     200 B |
|           CountCharsUsingLinqCount |   188.43 ns |  1.331 ns |  1.112 ns | 0.0286 |     120 B |
|               CountCharsUsingRegex | 1,659.42 ns | 29.571 ns | 37.398 ns | 0.6886 |    2880 B |

There are a few more methods in the benchmark result, and if you want, you can inspect their implementation in the source code.

We can see that methods using the Span<char> implementation are the fastest approaches. Of course, using a foreach or a for loop are other comparable ways of achieving the same in pretty much the same speed. Also, we see these iteration methods, alongside the Index method, allocate no memory.

On the other hand, Regex pattern matching is the slowest solution to find the number of occurrences of a character in a string.

Conclusion

In this article, we learned about counting the number of occurrences of a character within a string. We looked at different ways to do so using LINQ, iterating over the string, and other in-built methods.

Code Maze

View Comments

  • It's really a shame that the most "native" way to do it (LINQ Count()) comes out so badly. The culprit is the lambda function, which adds a call/return to each iteration of the loop. The actual comparison is very fast, but pushing the parameters and calling the function is many times slower. If MSFT would come up with a JITer that inlined lambdas, it would be an incredible production boon for .NET coding.

  • There's one option you're missing: SpanExtensions.Count from the CommunityToolkit.HighPerformance nuget package (previously known as Microsoft.Toolkit.HighPerformance.

    SpanExtensions.Count<T>(Span<T>, T) Method (Microsoft.Toolkit.HighPerformance.Extensions) | Microsoft Learn

    This method uses CPU hardware intrinsics like SSE2 and AVX to speed things up, and on my machine it beats all the other approaches by a wide margin. On my machine:

    CountCharsUsingSpanCount: 4.732 ns
    CountCharsUsingFor: 15.393 ns

    So the CommunityToolkit.HighPerformance version is more than 3 times faster than the fastest option in this article.

    @James Curran - for LINQ Count() there's more than one culprit. One of them is the lambda function, but the other is virtual method calls for GetEnumerator/MoveNext/Current in any IEnumerable<T> based code. I'm afraid that inlining the lambda would still not make the LINQ version the fastest. Also I'd argue that using hardware intrinsics is more "native" than LINQ could ever be, but I guess that depends on what you mean by "native".

    • Hi Joel. Thanks a lot for the comment. To be honest, I personally didn't know about the package, so this is truly great info. I couldn't get a similar result, in terms of such a big difference in speed, on my machine, but still, it is the fastest way using the ReadOnlySpan.Count extension method. So, we will definitely update the article to include this valuable info.

    • I meant "native" is the sense of a (seemingly) instance method whose specific function is what we want to do. It would be natural to assume that is the best way to do it. (And it might be a good idea to override Enumerable.Count() for strings with an extension method calling one of the better methods)

    • Also, I'm not convinced that GetEnumerator/MoveNext/Current is the problem. If so, CountCharsUsingFor would beat CountCharsUsingForeach by a wider margin. The only real difference between CountCharsUsingForeach and CountCharsUsingLinqCount is the lambda.

      • GetEnumerator/MoveNext/Current is certainly not the entire issue, but it's definitely part of it. I think you'll find that the compiler specializes foreach for certain types like arrays and strings, specifically to avoid the overhead of allocating an enumerator and making two virtual method calls per iteration. This is why the LINQ method allocates memory in the benchmark, but the foreach method does not.

        You could verify this by making a version of CountCharsUsingForeach that manually calls GetEnumerator/MoveNext/Current and compare it to the performance of the foreach version.

        This optimization is specific to the foreach statement, and isn't possible in LINQ extension methods, unfortunately.

Share
Published by
Code Maze

Recent Posts

Code Maze Weekly #149

Issue #149 of the Code Maze weekly. Check out what's new this week and enjoy…

Updated Date Nov 25, 2022

C# String Interpolation

Very early in the history of programming, we've seen the need to use text on…

Updated Date Nov 24, 2022

How to Check if a String Ends With a Number in C#

Checking if a string ends with a number in C# is a very common operation.…

Updated Date Nov 25, 2022

How to Execute Stored Procedures With EF Core 7

In this article, we will see how to use stored procedures in Entity Framework Core…

Updated Date Nov 22, 2022

HashSet vs SortedSet in C#

The HashSet<T> and SortedSet<T> classes in the System.Collections.Generic namespace define two ways of storing and iterating…

Updated Date Nov 22, 2022

Code Maze Weekly #148

Issue #148 of the Code Maze weekly. Check out what's new this week and enjoy…

Updated Date Nov 18, 2022