In software development, effective memory management plays a pivotal role, acting as a secret sauce to enhance our application’s performance. When working with C# code, managing strings, a common task, significantly impacts our program’s memory usage. Because .NET strings are immutable, repeated allocation of duplicate strings can potentially lead to excessive memory consumption and decreased performance.
To address this issue, the .NET community has introduced StringPool, a powerful helper class that optimizes memory usage by reusing strings, resulting in improved program efficiency. This article delves into how to use StringPool to reduce string allocations, providing sample code and a benchmark section for a comprehensive understanding of the concept.
Let’s start by revisiting the venerable string and StringBuilder.
Recalling String and StringBuilder
As we already mentioned, in .NET strings are immutable, prohibiting changes to their values after creation. Any operation appearing to modify a string generates a new string instance, resulting in heightened memory allocations. In contrast, StringBuilder offers mutability, enabling modification of its content without requiring new instances. This attribute renders StringBuilder more suitable for scenarios requiring extensive string concatenation or manipulation.
With this understanding, let’s delve into StringPool
.
Understanding StringPool
StringPool
actively manages a collection of string instances, with a primary aim to reduce memory allocations when generating numerous string objects, particularly from char or byte value buffers. It achieves this by reusing existing string instances from the pool, eliminating the need for allocating new ones each time.
When our application requires a new string, we can query the StringPool
to see if a matching version already exists. Upon finding a match, it retrieves the existing instance instead of generating a new one. This approach contributes to a more efficient memory management strategy, especially in scenarios where identical string values are repeatedly created.
StringPool
provides a unique data structure for managing string interning, a method for storing a single copy of each distinct string. While the language runtime typically handles string interning automatically, StringPool
allows us to actively configure and reset the pool as required.
Syntax and Usage
To use StringPool
we need to include the CommunityToolkit.HighPerformance package:
dotnet add package CommunityToolkit.HighPerformance
With that added, let’s create a helper class to explore usage scenarios:
public class StringPoolHelper { private readonly Dictionary<string, string> _cache = []; private StringPool _myPool; public bool Init(int poolSize) { _myPool = new StringPool(poolSize); var value1 = _myPool.GetOrAdd("codemaze"); var value2 = _myPool.GetOrAdd("codemaze"u8, Encoding.UTF8); return ReferenceEquals(value1, value2); } }
Here, we create a StringPoolHelper
class. We include an Init()
method, which we use for initiating a private instance of StringPool
. We provide a poolSize
to determine the size of the pool. Lastly, we call StringPool.GetOrAdd()
twice, adding the same string to our pool.
Our objective is to show that the second attempt at adding, actually returns the same string
instance, rather than creating a new one. We accomplish this, in the final line, by calling Object.ReferenceEquals()
to compare the references of value1
and value2
instances.
The GetOrAdd()
method serves as the primary interface for StringPool
, featuring three overloads: GetOrAdd(string)
, GetOrAdd(ReadOnlySpan<char>)
, and GetOrAdd(ReadOnlySpan<byte>, Encoding)
. In our example, we utilized the first and last of these methods overloads.
Now, let’s invoke the Init()
method:
var referenceEquals = StringPoolHelper.Init(); Console.WriteLine($"Shared Reference Equals : {referenceEquals}");
When we run and check the result, we see that references of value1
and value2
are equal. In other words, our pool only created the string
instance once.
Properties
The StringPool
class contains two properties:
Property | Description |
---|---|
Shared | Static property that offers a singleton reusable instance for access. |
Size | Instance property that states how many strings can be stored in the current pool. |
StringPool.Shared
instance efficiently pools string instances, offering thread-safe access for concurrent usage without requiring manual synchronization. It shares similarities with ArrayPool.Shared
, as both are optimally configured for general application scenarios, offering improved performance and resource utilization. Instead of initializing custom instances manually, it is recommended to use StringPool.Shared
, as it is tuned for most application use cases.
So, let’s illustrate these properties in practice, we’ll start with Size
, creating a GetMyPoolSize()
method which returns the size of the pool initialized in our Init()
method:
public int GetMyPoolSize() => _myPool.Size;
Now, let’s take a look at the static StringPool.Shared
property:
public static bool UseSharedInstance() { var value1 = StringPool.Shared.GetOrAdd("codemaze"); var value2 = StringPool.Shared.GetOrAdd("codemaze"u8, Encoding.UTF8); return ReferenceEquals(value1, value2); }
Here, we introduce the UseSharedInstance()
method which is similar to our Init()
method, however this time we leverage the StringPool.Shared
instance. Consequently, we will prioritize its utilization in subsequent examples.
Methods
The StringPool
class offers several methods. We already considered the GetOrAdd()
method. Note, that this method has overloads that accept ReadOnlySpan<char>
and ReadOnlySpan<Byte>, Encoding
as input. We already know that this method retrieves a cached string instance that matches the input content (converted to Unicode) with encoding (in case of use with Encoding
parameter as input), or creates a new instance based on the input parameters if no match is found.
Let’s review a few other methods available: Add(String)
– this method adds a string to the pool. Note, that performing the add operation twice will add only one string to the pool – since this method has an internal check for duplicates. The TryGet(ReadOnlySpan<Char>, out String)
attempts to retrieve a cached string instance that matches the provided input content, if available. If no value is found it returns false
. Reset()
resets the current pool instance along with its associated maps, so, that it sets all the internal collections and data structures to their initial state.
Scenarios Where StringPool Helps Reduce String Allocations
Let’s examine some scenarios wherein using StringPool
significantly influences the management and minimization of string allocations. First, let’s define a helper method:
private static string CombineSpan(ReadOnlySpan<char> first, ReadOnlySpan<char> second) { var combinedSpan = SpanOwner<char>.Allocate(first.Length + second.Length); var combined = combinedSpan.Span; first.CopyTo(combined); second.CopyTo(combined[first.Length..]); return StringPool.Shared.GetOrAdd(combinedSpan.Span); }
Here we create a CombineSpan()
method that takes two ReadOnlySpan<char>
parameters. (In our GitHub repository we actually create two overloads, one with two and one with three parameters, but for brevity’s sake only the first is shown here). We are using ReadOnlySpan<char>
objects as we do not want to perform any string concatenation in our method. Our goal in using StringPool
is to avoid the creation of new string
object whenever possible. Remember due to the immutability of strings, concatenation results in the creation of a new string instance.
To further reduce any new memory allocations, we also utilize SpanOwner<char>
from the high-performance toolkit, which leverages the shared ArrayPool for buffer renting.
In the final line, we utilize the SpanOwner<char>.Span
value to query the StringPool
to retrieve a string instance. If the value already exists in the pool, the existing instance is returned, otherwise, a new string is created and added to the pool before being returned.
Using StringPool to Reduce String Allocations in Caching
In caching scenarios, where keys are dynamically generated based on input parameters or identifiers, utilizing StringPool
to intern cache keys can streamline cache lookup operations and minimize memory usage:
public bool AddUser(ReadOnlySpan<char> nameSpan, ReadOnlySpan<char> emailSpan) { var cacheKey = CombineSpan("USER_", nameSpan); var cacheValue = StringPool.Shared.GetOrAdd(emailSpan); _cache[cacheKey] = cacheValue; return true; } public string GetUser(ReadOnlySpan<char> nameSpan) { var cacheKey = CombineSpan("USER_", nameSpan); return _cache.TryGetValue(cacheKey, out var value) ? value : string.Empty; }
In this example, we introduce two methods designed to manage a caching scenario. The AddUser()
method is responsible for caching user data. Rather than generating a new cache key for each request, we construct the key by calling our helper method CombineSpan()
.
Next, we attempt to retrieve the value (in this case, the email instance) from our pool. And finally, using our cache key, we set the cache value.
In a reverse manner, we employ the GetUser()
method to retrieve user data from the cache. Once again we leverage our helper method to construct the cache key. Subsequently, we access the cached value using this key. If the value is not found, we return an empty string.
Using StringPool to Reduce String Allocations in Request Url Management
In web applications, it’s typical to handle and process URLs regularly. Interning frequently accessed URL segments or patterns can optimize memory usage, resulting in quicker URL processing. For example, our application may have a publicly reachable API for which we may want to track how many requests come from a particular client.
Instead of handling this requirement by creating string instances and string operations, we may use StringPool
to reduce string allocations and enhance overall application performance:
public static string GetHostName(ReadOnlySpan<char> urlSpan) { var offset = urlSpan.IndexOf([':', '/', '/']); var start = offset == -1 ? 0 : offset + 3; var end = start + urlSpan[start..].IndexOf('/'); if (end == -1) return string.Empty; var hostName = urlSpan[start..end]; return StringPool.Shared.GetOrAdd(hostName); }
Here, we extract the hostname from a given urlSpan
represented by a ReadOnlySpan<char>
. We first find the starting position of the hostname within the URL by searching for the occurrence of ://
. Then, we determine the end position of the hostname by searching for the next occurrence of /
. Finally, we fetch the hostName
from the shared pool instance, ensuring efficient memory usage and potentially preventing duplicate allocations.
Using StringPool to Reduce String Allocations in Localization
In internationalization and localization tasks, where we need to translate strings into different languages, utilizing StringPool
for keys or identifiers for localized strings can improve performance and simplify language resource management:
public string Translate(ReadOnlySpan<char> keySpan, ReadOnlySpan<char> langSpan) { const string prefix = "LOCALIZATION_"; var calculatedKey = CombineSpan(prefix, langSpan, keySpan); _cache.TryGetValue(calculatedKey, out var value); return value ?? calculatedKey; }
In this scenario, we introduce the Translate()
method to obtain the translation of a key in a designated language. We form the translation key by combining a LOCALIZATION_
prefix, the langSpan
value, and the provided keySpan
by calling the helper method CombineSpan()
. Subsequently, we endeavor to retrieve the localization value from the translation cache, returning the result if it’s found in the translation cache, or the calculated key if not.
Comparing the Performance
So far, we’ve explored the concept of StringPool
and its potential for optimizing string allocations. Now, we’re ready to compare the performance of string
, StringBuilder
, and StringPool
in terms of speed, memory allocation, and efficiency. To streamline this comparison, we’ll leverage the BenchmarkDotNet library for benchmarking.
Preparation
We’ll create a char array of 1024 characters. Then in each benchmark method, we will loop for a set number of iterations, creating strings of length 64 from the char array. We will perform a benchmark test for iteration counts 1,000; 10,000; and 100,000.
Let’s examine the methods:
[Benchmark] public IList<string> UseString() { _dest.Clear(); var startIndex = 0; for (var i = 0; i < Iterations; i++) { if (startIndex + ChunkSize > _charArray.Length) { startIndex = 0; } _dest.Add(new string(_charArray, startIndex, ChunkSize)); startIndex += ChunkSize; } return _dest; } [Benchmark] public IList<string> UseStringPool() { _dest.Clear(); var startIndex = 0; for (var i = 0; i < Iterations; i++) { if (startIndex + ChunkSize > _charArray.Length) { startIndex = 0; } ReadOnlySpan<char> span = _charArray.AsSpan(startIndex, ChunkSize); _dest.Add(StringPool.Shared.GetOrAdd(span)); startIndex += ChunkSize; } return _dest; } [Benchmark] public IList<string> UseStringBuilder() { _dest.Clear(); var sb = new StringBuilder(); var startIndex = 0; for (var i = 0; i < Iterations; i++) { if (startIndex + ChunkSize > _charArray.Length) { startIndex = 0; } sb.Append(_charArray.AsSpan(startIndex, ChunkSize)); _dest.Add(sb.ToString()); sb.Clear(); startIndex += ChunkSize; } return _dest; }
Here, we establish three benchmark methods tailored for string
, StringPool
, and StringBuilder
. In each iteration of the loop we create a string instance based on a chunk from our _charArray
. We then store each string in a list and finally return the list. This helps to prevent the string creation from being optimized away.
Benchmark
Now we are ready to run our benchmark and analyze the outcome. It is important to note that since the focus of StringPool
is reducing memory allocation, our main focus in the benchmark is on allocations more than on speed:
| Method | Iterations | Mean | Gen0 | Gen1 | Gen2 | Allocated | |----------------- |----------- |-------------:|----------:|----------:|---------:|-----------:| | UseString | 1000 | 22.00 us | 69.4885 | 6.2256 | - | 152000 B | | UseStringBuilder | 1000 | 30.27 us | 68.5425 | 9.8267 | - | 152424 B | | UseStringPool | 1000 | 90.21 us | - | - | - | - | | | | | | | | | | UseString | 10000 | 388.20 us | 273.4375 | 226.0742 | - | 1520000 B | | UseStringBuilder | 10000 | 435.84 us | 272.9492 | 226.5625 | - | 1520424 B | | UseStringPool | 10000 | 924.97 us | - | - | - | - | | | | | | | | | | UseStringPool | 100000 | 8,882.90 us | - | - | - | 6 B | | UseStringBuilder | 100000 | 17,424.78 us | 3000.0000 | 2062.5000 | 562.5000 | 15200686 B | | UseString | 100000 | 19,311.59 us | 3000.0000 | 2093.7500 | 562.5000 | 15200250 B |
As our benchmark results illustrate, StringPool
excels in efficient memory management, demonstrating minimal memory allocations across all iterations. Interestingly, when we moved to 100,000 iterations we see the UseStringPool
method is about twice as performant as the other methods. This is due to the increased pressure placed on the garbage collector, as seen by examining the Gen0, Gen1, and Gen2 columns of the benchmark. This is exactly the type of scenario that StringPool
was created for.
The benchmark helps to emphasize the significance of selecting the most appropriate method based on the unique requirements and limitations of our application. StringPool
is not necessarily the right tool for everyday tasks, but in those scenarios where we are expecting to generate several duplicate strings, it can help reduce memory pressure and potentially improve application performance.
Conclusion
In this article, we delved into the notion of StringPool within C# and how we can use StringPool to reduce string allocations, thereby enhancing memory efficiency. By grasping the intricacies of string management in C# and employing StringPool efficiently, we can significantly boost our application’s performance, especially in situations where string values are recurrently utilized.