Efficiently Converting Strings With UTF-8 String Literals in C#

In this article, we’ll learn about the new syntax for UTF-8 string literals in C# 11, highlighting the performance enhancements vs. the older API.

To download the source code for this article, you can visit our GitHub repository.

Character encoding is crucial in ensuring accurate and consistent text display across various languages and characters in modern web development. The most widely used standard for character encoding is UTF-8, which allows each character to range from 1 to 4 bytes. However, within the .NET ecosystem, the default standard for character encoding is UTF-16, where each character is at least 2 bytes in size.

How Encoding Works in .NET

Writing an HTTP request or creating pure HTML for the browser requires converting UTF-16 to UTF-8. However, ASP.NET Core automatically handles this conversion without requiring any action from the developer. This automatic conversion applies only in low-level web development and is not a feature most developers will need to utilize.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!

Before C# 11, developers used the Encoding class to convert a .NET UTF-16 string:

public byte[] OldSyntax()
{
    return Encoding.UTF8.GetBytes("Hello World!");
}

Here, we convert the string "Hello World!" into a byte array using UTF-8 encoding, returning the resulting byte array. This method is inefficient due to excessive string usage and memory allocation. Using this method repeatedly in web applications that process numerous requests per second can add up to a significant issue.

Is there a more efficient way to reduce allocations? Yes, there is. Let’s examine how this can be enhanced using C# 11.

Encoding Improvements With C# 11

In C# 11, a better way to convert a string is to use the u8 suffix on it:

public ReadOnlySpan<byte> NewSyntax()
{
    return "Hello World!"u8;
}

Instead of returning an array, this method will provide a ReadOnlySpan<byte>. We may wonder why the .NET team made this choice. The reason is that this new API is more elegant in syntax and also more efficient and resource-friendly since it won’t cause any memory allocations on the heap.

To verify this, let’s examine the benchmark results:

|    Method |       Mean |     Error |    StdDev |   Gen0 | Allocated |
|---------- |-----------:|----------:|----------:|-------:|----------:|
| OldSyntax | 27.8322 ns | 0.1793 ns | 0.1589 ns | 0.0048 |      40 B |
| NewSyntax |  0.1011 ns | 0.0043 ns | 0.0040 ns |      - |         - |

From the results, it is quite evident that the new syntax is highly performant. We can further validate these results by writing unit tests for this benchmark code.

Conclusion

In this article, we looked at the new UTF-8 string literals syntax. UTF-8 is the web’s encoding standard, efficient for HTTP and HTML. ASP.NET Core automates UTF-16 to UTF-8 conversion. C# 11’s u8 suffix for string literals improves efficiency by providing ReadOnlySpan<byte>, minimizing memory allocation.

Ready to take your skills to the next level? Jump into our high-impact courses in web development and software architecture, all with a focus on mastering the .NET/C# framework. Whether you're building sleek web applications or designing scalable software solutions, our expert-led training will give you the tools to succeed. Visit our COURSES page now and kickstart your journey!

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!

Efficiently Converting Strings With UTF-8 String Literals in C#

How Encoding Works in .NET

Encoding Improvements With C# 11

Conclusion

Leave a reply Cancel reply

Courses – Code Maze

Ad 1

Ad 2

Ad 3

Ad 4