In this article, we’ll learn about the new syntax for UTF-8 string literals in C# 11, highlighting the performance enhancements vs. the older API.

To download the source code for this article, you can visit our GitHub repository.

Character encoding is crucial in ensuring accurate and consistent text display across various languages and characters in modern web development. The most widely used standard for character encoding is UTF-8, which allows each character to range from 1 to 4 bytes. However, within the .NET ecosystem, the default standard for character encoding is UTF-16, where each character is at least 2 bytes in size.

How Encoding Works in .NET

Writing an HTTP request or creating pure HTML for the browser requires converting UTF-16 to UTF-8. However, ASP.NET Core automatically handles this conversion without requiring any action from the developer. This automatic conversion applies only in low-level web development and is not a feature most developers will need to utilize.

Before C# 11, developers used the Encoding class to convert a .NET UTF-16 string:

public byte[] OldSyntax()
{
    return Encoding.UTF8.GetBytes("Hello World!");
}

Here, we convert the string "Hello World!" into a byte array using UTF-8 encoding, returning the resulting byte array. This method is inefficient due to excessive string usage and memory allocation. Using this method repeatedly in web applications that process numerous requests per second can add up to a significant issue.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Is there a more efficient way to reduce allocations? Yes, there is. Let’s examine how this can be enhanced using C# 11.

Encoding Improvements With C# 11

In C# 11, a better way to convert a string is to use the u8 suffix on it:

public ReadOnlySpan<byte> NewSyntax()
{
    return "Hello World!"u8;
}

Instead of returning an array, this method will provide a ReadOnlySpan<byte>. We may wonder why the .NET team made this choice. The reason is that this new API is more elegant in syntax and also more efficient and resource-friendly since it won’t cause any memory allocations on the heap.

To verify this, let’s examine the benchmark results:

|    Method |       Mean |     Error |    StdDev |   Gen0 | Allocated |
|---------- |-----------:|----------:|----------:|-------:|----------:|
| OldSyntax | 27.8322 ns | 0.1793 ns | 0.1589 ns | 0.0048 |      40 B |
| NewSyntax |  0.1011 ns | 0.0043 ns | 0.0040 ns |      - |         - |

From the results, it is quite evident that the new syntax is highly performant. We can further validate these results by writing unit tests for this benchmark code.

Conclusion

In this article, we looked at the new UTF-8 string literals syntax. UTF-8 is the web’s encoding standard, efficient for HTTP and HTML. ASP.NET Core automates UTF-16 to UTF-8 conversion. C# 11’s u8 suffix for string literals improves efficiency by providing ReadOnlySpan<byte>, minimizing memory allocation.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!