In this article, we’ll learn about the new syntax for UTF-8 string literals in C# 11, highlighting the performance enhancements vs. the older API.
Character encoding is crucial in ensuring accurate and consistent text display across various languages and characters in modern web development. The most widely used standard for character encoding is UTF-8, which allows each character to range from 1 to 4 bytes. However, within the .NET ecosystem, the default standard for character encoding is UTF-16, where each character is at least 2 bytes in size.
How Encoding Works in .NET
Writing an HTTP request or creating pure HTML for the browser requires converting UTF-16 to UTF-8. However, ASP.NET Core automatically handles this conversion without requiring any action from the developer. This automatic conversion applies only in low-level web development and is not a feature most developers will need to utilize.
Before C# 11, developers used the Encoding
class to convert a .NET UTF-16 string:
public byte[] OldSyntax() { return Encoding.UTF8.GetBytes("Hello World!"); }
Here, we convert the string "Hello World!"
into a byte array using UTF-8 encoding, returning the resulting byte array. This method is inefficient due to excessive string usage and memory allocation. Using this method repeatedly in web applications that process numerous requests per second can add up to a significant issue.
Is there a more efficient way to reduce allocations? Yes, there is. Let’s examine how this can be enhanced using C# 11.
Encoding Improvements With C# 11
In C# 11, a better way to convert a string is to use the u8
suffix on it:
public ReadOnlySpan<byte> NewSyntax() { return "Hello World!"u8; }
Instead of returning an array, this method will provide a ReadOnlySpan<byte>
. We may wonder why the .NET team made this choice. The reason is that this new API is more elegant in syntax and also more efficient and resource-friendly since it won’t cause any memory allocations on the heap.
To verify this, let’s examine the benchmark results:
| Method | Mean | Error | StdDev | Gen0 | Allocated | |---------- |-----------:|----------:|----------:|-------:|----------:| | OldSyntax | 27.8322 ns | 0.1793 ns | 0.1589 ns | 0.0048 | 40 B | | NewSyntax | 0.1011 ns | 0.0043 ns | 0.0040 ns | - | - |
From the results, it is quite evident that the new syntax is highly performant. We can further validate these results by writing unit tests for this benchmark code.
Conclusion
In this article, we looked at the new UTF-8 string literals syntax. UTF-8 is the web’s encoding standard, efficient for HTTP and HTML. ASP.NET Core automates UTF-16 to UTF-8 conversion. C# 11’s u8
suffix for string literals improves efficiency by providing ReadOnlySpan<byte>
, minimizing memory allocation.