In this article, we will examine several techniques for converting a hexadecimal string to a byte array.
In our previous article Converting a Byte Array to Hexadecimal String we explored several techniques for performing the conversion from binary data to hexadecimal. This time, we will explore several methods for reversing that conversion.
Motivation for Converting a Hexadecimal String to a Byte Array
Our previous article discussed some of the use cases for converting a byte array into a hexadecimal string. One very common use case is as a means of providing a license key string to a user of our software. While the hex string is something easy for an end user to deal with, within our software we will need to convert it back into an array of bytes.
Hexadecimal strings are often used for transmitting data over a network connection. They are frequently found when dealing with encryption. While they are useful for the transfer and storage of data, within the context of our executing application, we generally need to convert the data back into an array of bytes.
With that in mind let’s take a look at five different techniques, which we can use to convert a hexadecimal string to a byte array.
Before We Begin
For each of our examples, we will use the following hexadecimal string:
"0xDEADBEEFDECAFBAD"
This string is the hexadecimal encoding of the following byte array:
[222,173,190,239,222,202,251,173]
Hexadecimal strings are often written with the prefix 0x
for clarity, but this is not required. Because of this and to make our methods more general, we will support both prefixed and unprefixed strings. Our methods will also support converting both uppercase and lowercase hexadecimal strings to byte arrays.
For our article, we will create a simple static class ConversionHelpers
and will add each of our methods there.
Converting a Hexadecimal String to a Byte Array Using Modular Arithmetic
The first method we will look at performs modular arithmetic on each character in the string to convert it back into an array of bytes. First, let’s take a look at the workhorse of our method, the modular calculation.
Modular Calculation
To prevent code duplication and to make the method easier to read, let’s define a helper method to perform the modular calculation:
private static int PerformModularArithmeticCalculation(char value) => (value % 32 + 9) % 25;
This calculation converts an individual hex character back into its decimal value. Let’s unpack the method to see exactly what it is doing.
First, using the modulus operator, we restrict our char
value to the range 0-31
. But where did this “magic” value of 32
come from? We chose it based on the ASCII value of the characters. ‘A’ has a value of 65
while ‘a’ is 97
. Subtracting them yields a distance of 32
. This means performing a modulus 32 calculation on either of them yields the same result, namely 1
.
Next, we add 9
to our result. Keeping in mind that the decimal value of the hexadecimal digit A
is 10
, it should be clear why we add 9
to our modulus result. The goal of our final calculation is: ‘A’ = 10. Thus in order to finish mapping A-F
to 10-15
we have to add 9
.
Our last step is another modulus operation. Once again we have an apparently “magic” number: 25
. To find the source of this magic, we need to revisit the ASCII character values. This time let’s look at the ASCII values for the digits 0-9
. The character ‘0’ has an ASCII value of 48
. 48 mod 32
is 16
. Adding 9
yields 25
(our “magic” number). And of course, 25 % 25
is 0
. Since the characters ‘0’-‘9’ are contiguous within the ASCII standard, this modulus operation converts them to their equivalent decimal values.
Our Method
Now, let’s take a look at the whole method:
public static byte[] FromHexWithModularArithmetic(ReadOnlySpan<char> input) { if (input.Length % 2 != 0) throw new ArgumentException("Input has invalid length", nameof(input)); if (input.StartsWith("0x")) input = input[2..]; if (input.IsEmpty) return Array.Empty<byte>(); var dest = new byte[input.Length >> 1]; for (int i = 0, j = 0; j < dest.Length; j++) dest[j] = (byte) ((PerformModularArithmeticCalculation(input[i++]) << 4) + PerformModularArithmeticCalculation(input[i++])); return dest; }
First, we ensure our string is of a proper length. Converting a byte array to a hexadecimal string effectively doubles the length of our data, as it requires two hexadecimal characters to represent each byte. This means, that in order for our hex string to be valid, its length must be a multiple of 2. If the length is invalid, we throw an ArgumentException
.
Secondly, and as we mentioned earlier, to make our methods more general, we are supporting strings with and without a leading 0x
(“hex marker”). In order to do that, after validating for proper length, we check for the hex marker and strip it off if it exists. Since we are using a ReadOnlySpan<char>
here, all we need to do is reset our input
to be the slice of the span after the hex marker.
Thirdly, we check for an empty string. In the case of an empty string, we can bail early, returning an empty array of bytes.
One important thing to note with this method is the lack of any other validation on the input. It assumes that the input data contains only valid hexadecimal strings. This is one drawback of this method.
Next, we allocate our destination array, which has a length that is exactly one-half of the hexadecimal string length.
Lastly, we iterate the string, converting each pair of characters into their associated byte representation.
Each character makes up a nibble, with the first character representing the high bits of our byte and the second character the low bits. For this reason, we bit shift the first character calculation by 4 before adding it to the second character calculation. And finally, we store the result in our destination array.
Usage
Now, let’s take a look at our method in action:
ConversionHelpers.FromHexWithModularArithmetic("0xDEADBEEFDECAFBAD");
Which returns our expected array:
[222,173,190,239,222,202,251,173]
Converting a Hexadecimal String to a Byte Array Using Switch Statement
Now, let’s look at performing the hex conversion using a switch statement.
Switch Computation
First, we need to define a helper method which will perform the main computation for us:
[MethodImpl(MethodImplOptions.AggressiveInlining)] internal static byte ComputeNibbleFromHexChar(char hexChar) => hexChar switch { >= '0' and <= '9' => (byte) (hexChar - '0'), >= 'a' and <= 'f' => (byte) (hexChar - 'a' + 10), >= 'A' and <= 'F' => (byte) (hexChar - 'A' + 10), _ => throw new ArgumentException($"Invalid hex digit: '{hexChar}'", nameof(hexChar)) };
For performance reasons, we add the attribute MethodImpl(MethodImplOptions.AggressiveInlining)
hinting to the JIT compiler that this method is a good candidate for inlining. This doesn’t guarantee inlining but rather instructs the JIT to consider it.
Next, we use a switch expression with pattern matching to convert the hex character to its decimal equivalent. If the hex char is a digit, we simply subtract the value of the ‘0’ from the digit character to convert it to decimal. For any of the values A-F
and a-f
, we perform the same type of subtraction, which will yield a value between 0 and 5, so we need to add 10 to compute the appropriate value.
Lastly, for any value that doesn’t match, we throw an ArgumentException
, indicating that the value is not a valid hex digit.
Our Method
Now, let’s take a look at our main method:
public static unsafe byte[] FromHexWithSwitchComputation(ReadOnlySpan<char> input) { // ...Initial checks removed for brevity... var dest = new byte[input.Length >> 1]; fixed (char* s = input) fixed (byte* d = dest) { var destPtr = d; var srcPtr = s; while (*srcPtr != 0) { var result = (byte) (ComputeNibbleFromHexChar(*srcPtr++) << 4); result |= ComputeNibbleFromHexChar(*srcPtr++); *destPtr++ = result; } } return dest; }
After the initial checks, we allocate our destination array. Next, for performance reasons, we pin both our input and destination arrays. In the next lines, we make a copy of the pointers to enable incrementing them within our loop. Then we begin looping through the string until we reach the null terminator.
Inside the loop, we perform several actions. Our first line itself performs three actions. First, it computes the decimal value of the hex character. Second, it moves our srcPtr
to the next character in the hex string. Last, since this is the high nibble of our resulting byte, it left-shifts the value by 4.
Our second line is similar to the first, only without the need to perform a bit shift since this is the low nibble of the resultant byte.
Lastly, we store the result in the destination array and increment the destPtr
to the next element in the destination.
Usage
So, let’s see it in action:
ConversionHelpers.FromHexWithSwitchComputation("0xDEADBEEFDECAFBAD");
Again we see the method returns the expected result:
[222,173,190,239,222,202,251,173]
Converting a Hexadecimal String to a Byte Array Using Bit Manipulation
Our next conversion method involves performing a simple bit manipulation calculation.
Bit Manipulation Calculation
First, let’s take a look at our bit calculation:
[MethodImpl(MethodImplOptions.AggressiveInlining)] private static int PerformBitManipulation(int charValue) { charValue -= 'A'; return charValue + 10 + ((charValue >> 31) & 7); }
Let’s walk through our code to understand what our calculation is doing. In our first step, we subtract the value of the character ‘A’ (65) from our input. This will be important as we look at the next part of the calculation.
For the next step, we need to consider what the behavior is for the values A-F
and 0-9
separately. First, let’s examine A-F
. Since the value of A
is 65
, after our initial subtraction, our value is going to be between 0
and 5
. This means that when we add 10
to it, our value will now be between 10
and 15
. The following computation value >> 31
extracts the sign from the value, which in this case is 0
. Of course a bitwise &
of anything with 0
yields 0
. So, for the values A-F
, the calculation is equivalent to value - 65 + 10
.
Now let’s consider the case of characters 0-9
. When we subtract 65
from this we end up with a value in the range between -17
and -8
. The next step of adding 10
shifts our range to between -7
and 2
. Then in our last step, extracting the sign, we end up performing the calculation -1 & 7
, which yields 7
. Adding this to our value shifts it to the range 0
to 9
.
Our Method
Now, let’s put it all together:
public static unsafe byte[] FromHexWithBitManipulation(ReadOnlySpan<char> input) { // ...Initial checks removed for brevity... var dest = new byte[input.Length >> 1]; fixed (char* srcPtr = input) fixed (byte* destPtr = dest) { var sPtr = &srcPtr[0]; var dPtr = &destPtr[0]; while (*sPtr != 0) { var hi = PerformBitManipulation(*sPtr++); var lo = PerformBitManipulation(*sPtr++) & 0x0F; *dPtr++ = (byte) (lo | (hi << 4)); } } return dest; }
As we saw earlier, after our initial validation checks, we begin by allocating our destination array. This is followed by getting pointers into our input data and destination array. Next, we make copies of the pointers so we can increment them, and finally begin iterating through the input.
For each character, we perform our bit manipulation, but for the low part of the byte, we also perform a bitwise &
with 0x0F
, which handles lowercase a-f
. It is equivalent to conditionally subtracting 32 (the difference between the ASCII values of A
and a
)Â from the value when the input is lowercase.
Usage
Now, let’s watch it in action, but for fun, we’ll switch it up a bit and pass in lowercase hexadecimal digits:
ConversionHelpers.FromHexWithBitManipulation("0xdeadbeefdecafbad");
Once again the method returns the expected result:
[222,173,190,239,222,202,251,173]
Converting a Hexadecimal String to a Byte Array Using FromHexString()
.NET 5 brought us a long-awaited addition to the Convert
static class, the ToHexString() and FromHexString() methods. In our previous article, we saw how to use the ToHexString()
method to convert a byte array to hex. Now, we can use the FromHexString()
method to reverse that operation.
Let’s take a look at the code:
public static byte[] FromHexWithConvert(ReadOnlySpan<char> input) { if (input.StartsWith("0x")) input = input[2..]; return Convert.FromHexString(input); }
There is almost nothing to this method. As we have seen in previous methods, we first check for the “hex marker” in the string and remove it if necessary. After that we only have one other step to perform, namely calling the built-in FromHexString()
method. In our case, we are passing in a ReadOnlySpan<char>
, but the method also has an overload that accepts a string
parameter. We don’t need to perform any additional validation on our input data, as the built-in method takes care of that, throwing a FormatException
on invalid data.
One of the biggest advantages of calling a framework method is maintenance. We don’t have to worry about maintaining this method. We don’t have to add tests around it to protect ourselves from silly mistakes in future code updates. It is part of the framework so we can expect it to be correct and reasonably performant.
Usage
Let’s observe the built-in method in action:
ConversionHelpers.FromHexWithConvert("0xDEADBEEFDECAFBAD");
As expected, it returns the following array:
[222,173,190,239,222,202,251,173]
Converting a Hexadecimal String to a Byte Array Using a Lookup Table
The last method we will consider is converting using a precomputed lookup table. In our previous article, we discussed the speed/space tradeoff and how we can often increase performance at the cost of using more memory. This is a prime example of that truth.
We precompute two lookup tables. One for the high bits and one for the low bits of our converted byte. For brevity’s sake, we are not listing the tables in the article, but both the tables and a method for computing them can be found in the GitHub repo for this article.
Now, let’s take a look at the code:
public static unsafe byte[] FromHexWithLookup(ReadOnlySpan<char> input) { // ...Initial checks removed for brevity... var dest = new byte[input.Length >> 1]; fixed (char* inputRef = input) fixed (byte* hiRef = LookupTables.FromHexHighBitsLookup) fixed (byte* lowRef = LookupTables.FromHexLowBitsLookup) fixed (byte* destRef = dest) { var s = &inputRef[0]; var d = destRef; while (*s != 0) { byte lowValue; if (*s > 102 || (*d = hiRef[*s++]) == 255 || *s > 102 || (lowValue = lowRef[*s++]) == 255) throw new ArgumentException($"Invalid character found in string: '{*s}'", nameof(input)); *d++ += lowValue; } return dest; } }
After the initial checks, we allocate the destination array. Following that, we get pointers to our input, to the lookup tables, and to our destination. Then, as we have seen before, we create two incrementable pointers to our inpute and destination arrays.
Once all of our setup is done, we loop through the string until we hit the null terminator (end of string marker), converting the hexadecimal character pairs to bytes as we go.
The Magic Inside the Loop
The real magic of our method occurs inside the loop. First, we have to create a variable to hold the low nibble of our final byte. The next step is a rather complicated-looking if statement, but this is the workhorse of the method, so let’s break it down step by step.
Our first check ensures the character is not too large to be a valid hex character (f
is 102 in ASCII).
The next statement performs a table lookup for the high nibble using the character value as an index, while also incrementing the input pointer. This check also assigns the lookup result to the destination pointer. The equality check here ensures that the input character is a valid hexadecimal digit. In our lookup table, we set all invalid indices to the value 255
. If a lookup returns 255
, then we know the character is not valid.
The second half of the if statement performs the same operation, only now using the low-bit lookup table. The other main difference is that we also assign the lookup result to a temporary variable, which we add to the destination byte in the next step.
Assuming all was valid in the if block, we add the low nibble to the destination pointer and increment it to the next location in the destination array.
Usage
Now, let’s put our method to work:
ConversionHelpers.FromHexWithLookup("0xDEADBEEFDECAFBAD");
As before, the method returns the expected array:
[222,173,190,239,222,202,251,173]
Benchmarks
When comparing different ways of performing the same calculation, it is important to benchmark them. This aids us immensely when choosing which method to implement in our own code. We will run benchmarks for different data sizes (64 / 256 / 8,192 and 2,097,152 characters strings) as performance can vary based on the length of the input data.
Now let’s take a look at the results.
64-Character Hexadecimal String Results
| Method | source | Mean | Error | StdDev | |----------------------- |--------------------- |----------------:|--------------:|--------------:| | UsingLookup | 3E17B(...)AB036 [64] | 40.74 ns | 0.364 ns | 0.341 ns | | UsingBitManipulation | 3E17B(...)AB036 [64] | 50.97 ns | 0.569 ns | 0.532 ns | | UsingConvert | 3E17B(...)AB036 [64] | 57.00 ns | 0.381 ns | 0.337 ns | | UsingSwitchComputation | 3E17B(...)AB036 [64] | 74.34 ns | 0.287 ns | 0.255 ns | | UsingModularArithmetic | 3E17B(...)AB036 [64] | 83.65 ns | 0.181 ns | 0.161 ns |
256-Character Hexadecimal String Results
| Method | source | Mean | Error | StdDev | |----------------------- |--------------------- |----------------:|--------------:|--------------:| | UsingLookup | CFD6(...)5125 [256] | 143.39 ns | 0.543 ns | 0.482 ns | | UsingBitManipulation | CFD6(...)5125 [256] | 190.60 ns | 1.254 ns | 1.173 ns | | UsingConvert | CFD6(...)5125 [256] | 197.93 ns | 1.391 ns | 1.301 ns | | UsingSwitchComputation | CFD6(...)5125 [256] | 264.38 ns | 3.659 ns | 3.423 ns | | UsingModularArithmetic | CFD6(...)5125 [256] | 317.89 ns | 1.510 ns | 1.339 ns |
8,192-Character Hexadecimal String Results
| Method | source | Mean | Error | StdDev | |----------------------- |--------------------- |----------------:|--------------:|--------------:| | UsingLookup | 3577(...)6345 [8192] | 4,186.80 ns | 29.551 ns | 23.072 ns | | UsingBitManipulation | 3577(...)6345 [8192] | 5,647.30 ns | 29.041 ns | 27.165 ns | | UsingConvert | 3577(...)6345 [8192] | 5,701.99 ns | 26.212 ns | 24.519 ns | | UsingModularArithmetic | 3577(...)6345 [8192] | 9,780.12 ns | 32.669 ns | 27.280 ns | | UsingSwitchComputation | 3577(...)6345 [8192] | 13,703.47 ns | 145.273 ns | 128.781 ns |
2,097,152-Character Hexadecimal String Results
| Method | source | Mean | Error | StdDev | |----------------------- |--------------------- |----------------:|--------------:|--------------:| | UsingLookup | 15(...)A3 [2097152] | 1,340,016.04 ns | 9,457.979 ns | 8,846.999 ns | | UsingBitManipulation | 15(...)A3 [2097152] | 1,676,782.06 ns | 10,580.369 ns | 9,379.221 ns | | UsingConvert | 15(...)A3 [2097152] | 1,677,930.52 ns | 11,928.114 ns | 11,157.565 ns | | UsingModularArithmetic | 15(...)A3 [2097152] | 2,867,833.16 ns | 16,109.842 ns | 15,069.156 ns | | UsingSwitchComputation | 15(...)A3 [2097152] | 7,451,976.25 ns | 63,807.009 ns | 59,685.116 ns |
From the benchmark results, we see that in all cases our lookup table method is the most performant. In all but one case the bit manipulation method is the second most performant. The .NET 5+ FromHexString()
method averages out to be the third most performant overall.
With that being said, however, the difference in performance between these three methods is not so significant. For an average-sized hexadecimal string, the difference is only a matter of a few hundred nanoseconds. Unless we are performing a large amount of conversions in a tight loop, this difference will all but disappear.
So what does that mean? If we are using a newer framework and are not performing a large amount of hexadecimal string-to-byte array conversions in a tight loop, then we should probably use the built-in FromHexString()
method. This makes our code cleaner, more readable, and above all, more maintainable. At the end of the day, clear, concise, and maintainable code is a huge win for all involved in the project.
Conclusion
In this article, we have examined several different methods for converting a hexadecimal string into a byte array. We also benchmarked each of the implementations in order to have a better idea of the performance characteristics with different size data sets. From those results, we saw that the most performant version is our custom lookup table implementation. But, unless we need to squeeze out those extra few cycles of performance in our code, from a code maintainability perspective, if we are able to use the newer framework-provided method, we probably should choose that option. Our future self will thank us for keeping the code clear and readable.