In this article, we’re going to look at removing all whitespace characters from a string in C#. Strings often have leading or trailing whitespace that we would like to remove, or we may even want to remove all whitespace in between. Let’s take a look at several ways in which we can remove whitespace characters from a string.
Let’s start.
What Is Whitespace?
Whitespace characters include not only the common characters that instantly come to mind: space (" "
), tab (\t
) and newline (\n
), but several other Unicode characters such as non-breaking space (U+00A0
) or Em Space (U+2003
).
Use Regex to Remove All Whitespace Characters from a String
Regular expressions are very powerful in finding and replacing characters in a string, and we can easily use them to replace all whitespaces with an empty string.
Let’s create a class RemoveWhitespaceMethods
in which we will keep all our different methods to remove whitespaces. Let’s now also create a new static method RemoveWhitespacesUsingRegex()
within this class.
In this method, let’s create a string variable named source
which may contain multiple whitespace characters, and then use the Replace()
method to return a new string, where all occurrences of whitespace characters are replaced with the empty string:
public static string RemoveWhitespacesUsingRegex(string source) { return Regex.Replace(source, @"\s", string.Empty); }
Now, let’s use this method to remove all whitespace:
var sourceRegex = "\v\tHello World!\r\n"; var resultRegex = RemoveWhitespaceMethods.RemoveWhitespacesUsingRegex(sourceRegex); Console.WriteLine(resultRegex); // prints 'HelloWorld!'
Improving Performance with Source Generators
In our example, we are using the static Regex.Replace
method, but if we plan on calling our method repeatedly, we can gain a large performance improvement via the .NET 7 regex source generator.
Note: If you are not able to move to .NET 7, you can gain similar performance by creating the regex with the RegexOptions.Compiled
option and caching it. You can see an example of this in the source code for this article:
[GeneratedRegex(@"\s")] public static partial Regex SourceGenRemoveWhitespaceRegex(); public static string RemoveWhitespacesUsingSourceGenRegex(string source) { return SourceGenRemoveWhitespaceRegex().Replace(source, string.Empty); }
Use LINQ to Remove All Whitespace Characters from a String
We can also use LINQ to remove all whitespace characters.
Let’s again make a new method called RemoveWhitespacesUsingLinq()
taking a string source
as a parameter. Within this method, we use the Where()
method from LINQ. Within the Where()
method, we pass in an expression that determines whether a given character is whitespace or not using char.IsWhiteSpace()
.
A common method seen around the internet is to pass the LINQ expression into String.Concat()
to combine the non-whitespace characters into a new string. As we will see later in the benchmarks section, this method is less performant than calling the string
constructor directly, and so for our example we will just directly construct a string:
public static string RemoveWhitespacesUsingLinq(string source) { return new string(source.Where(c => !char.IsWhiteSpace(c)).ToArray()); }
Now, let’s use this method to remove whitespaces in our string:
var sourceLinq = "\v\tHello World!\r\n"; var resultLinq = RemoveWhitespaceMethods.RemoveWhitespacesUsingLinq(sourceLinq); Console.WriteLine(resultLinq); //prints 'HelloWorld!'
Use String.Replace() to Remove All Whitespace Characters from a String
String.Replace()
is a straightforward way of replacing all occurrences of a character within a string, with a different character, for instance, to replace a comma with a semicolon. In this case, we want to replace the whitespace characters with an empty string, i.e. with string.Empty
.
As the String.Replace()
method only replaces just a single character, we need to apply this multiple times in order to remove all whitespace characters from the string.
Let’s create a method called RemoveWhitepacesUsingReplace()
which returns a new string with the whitespace characters removed. Keep in mind that this is suboptimal, as the String.Replace()
method creates a new object every time, so from a performance and memory usage point of view other methods might be preferred:
public static string RemoveWhitespacesUsingReplace(string source) { foreach (var c in AllWhitespaceCharacters) source = source.Replace(c, string.Empty); return source; }
Usage is similar to the examples before:
var sourceReplace = "\v\tHello World!\r\n"; var resultReplace = RemoveWhitespaceMethods.RemoveWhitespacesUsingReplace(sourceReplace); Console.WriteLine(resultReplace); // prints 'HelloWorld!'
Use String.Split() and String.Join() to Remove Whitespaces
The String.Split()
method returns a string array whose elements are delimited by a specified string. The String.Join()
method takes an array of strings and combines them into a new string. We can combine the two of them to perform a removal of all whitespace characters from a string.
Let’s make a static method called RemoveWhitespacesUsingSplitJoin()
to show how we can combine these two methods to accomplish our goal:
public static string RemoveWhitespacesUsingSplitJoin(string source) { return String.Join("", source.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries)); }
For the String.Split()
method, there are several overloads. In this case, we use Split(String[], StringSplitOptions)
. We pass in default(string[])
in the String.Split()
method because we want to pass in a null for the String[] separator
parameter, which is then interpreted as using whitespace characters as delimiters. We also pass StringSplitOptions.RemoveEmptyEntries
as the second argument, ensuring that empty entries are removed from the resulting array.
Then, we pass the output of the String.Split()
method into the String.Join()
method. We want to join the substrings without any space or commas in between, so we pass an empty string as the first argument.
Now let’s take a look at our method in action:
var sourceSplitJoin= "\v\tHello World!\r\n"; var resultSplitJoin = RemoveWhitespaceMethods.RemoveWhitespacesUsingSplitJoin(sourceSplitJoin); Console.WriteLine(resultSplitJoin); //prints 'HelloWorld!'
Use StringBuilder to Remove All Whitespace Characters from a String
This next method takes advantage of the StringBuilder to piece together our new string one character at a time. Since we know that the maximum length of our resultant string is the length of our input string, we can initialize the StringBuilder to our maximum capacity. This will help prevent reallocations within the StringBuilder
.
Let’s make a static method called RemoveWhitespacesUsingStringBuilder()
to remove all whitespace characters:
public static string RemoveWhitespacesUsingStringBuilder(string source) { var builder = new StringBuilder(source.Length); for (int i = 0; i < source.Length; i++) { char c = source[i]; if (!char.IsWhiteSpace(c)) builder.Append(c); } return source.Length == builder.Length ? source : builder.ToString(); }
Using the input string source
, we create a StringBuilder
and then loop over all characters of our source
string. If the character does not equal to any whitespace character, we append it to our StringBuilder
. After looping over all the characters, we create a new string from our builder.
Now let’s use our method to create a new string without whitespace characters:
var sourceStringBuilder = "\v\tHello World!\r\n"; var resultStringBuilder = RemoveWhitespaceMethods.RemoveWhitespacesUsingStringBuilder(sourceStringBuilder); Console.WriteLine(resultStringBuilder); //prints 'HelloWorld!'
Use a Pooled Array to Remove All Whitespace Characters from a String
This technique is similar to the one involving the StringBuilder, but by making use of the Array Pool, we are able to reduce the memory allocations in our code as well as increase the performance of our application. Let’s create a new method called RemoveWhitespacesUsingArray
:
public static string RemoveWhitespacesUsingArray(string source) { const int maxStackArray = 256; // if source is small enough, we can avoid heap allocation if (source.Length < maxStackArray) return RemoveWhitespacesSpanHelper(source, stackalloc char[source.Length]); var pooledArray = ArrayPool<char>.Shared.Rent(source.Length); try { return RemoveWhitespacesSpanHelper(source, pooledArray.AsSpan(0, source.Length)); } finally { ArrayPool<char>.Shared.Return(pooledArray); } } private static string RemoveWhitespacesSpanHelper(string source, Span<char> dest) { var pos = 0; foreach (var c in source) if (!char.IsWhiteSpace(c)) dest[pos++] = c; return source.Length == pos ? source : new string(dest[..pos]); }
There are two things to notice in this code. The first is the addition of the helper method RemoveWhistespacesSpanHelper
. We added this so that we can gain an additional performance improvement when the source
string is less than 256 characters. In that situation, we can use a stackalloc
array and avoid heap allocations altogether (with the exception of the final returned string of course). The second important piece is the addition of the ArrayPool
. If we rent an array from the pool, we have to be sure to return it.
Now, let’s watch our method work:
var sourceArray = "\v\tHello World!\r\n"; var resultArray = RemoveWhitespaceMethods.RemoveWhitespacesUsingArray(sourceArray); Console.WriteLine(resultArray); // prints "HelloWorld!"
Trimming Whitespace
Sometimes we only need to remove the whitespace from the front and/or end of a string. Let’s take a look at two ways of doing that.
Use String.Trim() to Remove Leading and Trailing Whitespace Characters
String.Trim()
efficiently removes both the leading and trailing whitespace characters, while all whitespace characters in the middle are unaffected. If we just need to trim whitespace from the front or the back we can use String.TrimStart()
or String.TrimEnd()
.
Let’s look at how we can use this to remove whitespace characters from the beginning and the end of the string while leaving the spaces in between unaffected. As earlier, we put this in a method, in this case RemoveLeadingAndTrailingWhitespacesUsingTrim()
:
public static string RemoveLeadingAndTrailingWhitespacesUsingTrim(string source) { return source.Trim(); }
Let’s also take a look at how this method works.
You can see that leading and trailing spaces are gone, but the space in between words is unaffected:
var sourceTrim = "\v\tHello World!\r\n"; var resultTrim = RemoveWhitespaceMethods.RemoveLeadingAndTrailingWhitespacesUsingTrim(sourceTrim); Console.WriteLine(resultTrim); //prints 'Hello World!'
Using Regex to Remove Leading and Trailing Whitespace Characters
For completeness, let’s take a look at how we can use a Regex to trim leading and trailing whitespace:
[GeneratedRegex(@"(^\s+|\s+$)")] public static partial Regex SourceGenTrimWhitespaceRegex(); public static string TrimWhitespacesUsingSourceGenRegex(string source) { return SourceGenTrimWhitespaceRegex().Replace(source, string.Empty); }
We will see when running our benchmarks, that the regex method of string trimming is not even close to the performance of the built-in String.Trim()
method, but it is always good to see that there is another way to do something.
Benchmarking Our Methods
Now that we have seen different ways of removing whitespace, it is important to benchmark them to see how they perform. Benchmarking helps us to choose which method is best for our particular situation. All of the code for running the benchmarks is available in the source code for this article.
With that being said, let’s take a look at the results.
Benchmarks for Removing Whitespace
First, let’s take a look at the results of running our whitespace removal methods. The benchmark results have been ordered from slowest to fastest. Also, for brevity, the benchmark results have been truncated. For full results, you can check out the code associated with the article:
| Method | source | Mean | Gen0 | Gen1 | Gen2 | Allocated | |----------------------- |--------- |----------------:|---------:|---------:|---------:|----------:| | UsingStaticRegexClass | [134416] | 2,420,527.13 ns | 58.5938 | 58.5938 | 58.5938 | 215722 B | | UsingSplitJoin | [134416] | 1,616,248.77 ns | 248.0469 | 185.5469 | 185.5469 | 1400751 B | | UsingCachedRegex | [134416] | 1,587,733.16 ns | 60.5469 | 60.5469 | 60.5469 | 215721 B | | UsingLinqWithConcat | [134416] | 1,550,704.77 ns | 66.4063 | 66.4063 | 66.4063 | 215791 B | | UsingSourceGenRegex | [134416] | 1,458,242.60 ns | 60.5469 | 60.5469 | 60.5469 | 215721 B | | UsingLinqWithConstruct | [134416] | 1,228,482.54 ns | 175.7813 | 175.7813 | 175.7813 | 694340 B | | UsingReplace | [134416] | 711,305.70 ns | 230.4688 | 230.4688 | 230.4688 | 739100 B | | UsingStringBuilder | [134416] | 389,208.25 ns | 142.5781 | 142.5781 | 142.5781 | 484632 B | | UsingArray | [134416] | 359,130.42 ns | 66.4063 | 66.4063 | 66.4063 | 215703 B |
We see from the results that the fastest method is our array-backed replacement. A very close second is the method that uses a StringBuilder
. The next closest method, which uses repeated calls to String.Replace()
, is about twice as slow as our array-backed method.
Hopefully, looking at the results helps to reinforce the importance of benchmarking our code. It is easy to find a code snippet that looks very clean and easy to implement, only to find out that we have drastically reduced the performance of our code.
Another thing to notice is the amount of memory allocated. The method using String.Split()
and String.Join()
allocates about 7x more memory than our best-performing method. In the case of the StringBuilder
, we see that it allocates 2x the memory that our array-backed method allocates. This may not be a problem in most cases, but it is something to be aware of when we start thinking about memory pressure and the impact that has on garbage collection.
Benchmarks for Trimming Whitespace
Now let’s look at the trimming benchmarks:
| Method | source | Mean | Gen0 | Allocated | |-------------------- |----------------------------- |--------------:|-------:|----------:| | UsingSourceGenRegex | \n\n\n\n(...)\t\t\t\t [5328] | 19,002.064 ns | 5.0354 | 10600 B | | UsingStringTrim | \n\n\n\n(...)\t\t\t\t [5328] | 474.365 ns | 5.0502 | 10600 B |
Here we see that there is an absolute and clear winner between our two methods, String.Trim()
. It would be virtually impossible for us to craft a method that will beat the built-in String.Trim()
method, but as always, it is good to benchmark things to see how they stack up. From this simple benchmark, we see that our Regex
based method is way short of the mark and is definitely not something we would want to be calling in place of built-in String.Trim()
.
Conclusion
In this article, we explored different methods of removing whitespace characters from a string. We saw that those different methods lead to the same result, but the performance of each one can be drastically different. Lastly, we benchmarked each of our methods so that we can make an informed decision when choosing which one to implement in our own code.