Convert a File to a Byte Array in C#

Unlock the power of microservices architecture to build scalable, resilient, and efficient systems. This ONLINE TEXT hands-on course takes you from the fundamentals of microservices to advanced techniques in communication, testing, security, and deployment. Perfect for developers and architects ready to design and deploy modern distributed systems. Check our entirely new Microservices in .NET Course and get it at the best price right now!

BIG LAUNCH WEEK DISCOUNT is ON!

In this article, we will learn about situations where we may need to convert a file into a byte array. Additionally, we will learn two ways to perform the conversion in C#.

If you want to learn how to convert a byte array to a file, check out our convert byte array to a file article.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!

To download the source code for this article, you can visit our GitHub repository.

So let’s start.

What is a Byte Array?

In C#, a byte array is an array of 8-bit unsigned integers (bytes). They are often used to represent more complex data structures, such as text, images, or audio data.

There are several use cases in which we want to convert a file to a byte array, some of them are:

Loading file contents into memory for processing
Network transmission of file data
File format conversion
File encryption

Generally, a byte array is declared using the byte[] syntax:

byte[] byteArray = new byte[50];

This creates a byte array with 50 elements.

Some Initial Setup

Each of our examples in this article is exercised through the unit test project. The unit tests will create temporary files for the tests and delete them when all the tests have finished executing. For more about unit testing with xUnit and the IClassFixture<T> be sure to check out our article “Differences Between NUnit, xUnit and MSTest“

When converting a file into a byte array it is important to remember that if the file size exceeds Array.MaxLength, we will not be able to read all of the contents at once. Attempting to do so will result in an exception. In the last section of our article, we will review a technique for solving this problem by working on the file in chunks.

Use ReadAllBytes to Convert a File

One of the most straightforward methods for converting a file is the File.ReadAllBytes() static method:

public void GivenFile_WhenConvertingUsingReadAllBytes_ThenReturnsCorrectContent()
{
    var bytes = File.ReadAllBytes(fixture.SmallTestFile);

    bytes.Should().BeEquivalentTo(fixture.SmallTestFileExpectedBytes);
}

This method opens the file, reads all of the contents into an array, and returns it.

If we are working with asynchronous code, we can use the File.ReadAllBytesAsync() method to accomplish the same goal:

public async void GivenFile_WhenConvertingUsingReadAllBytesAsync_ThenReturnsCorrectContent()
{
    var bytes = await File.ReadAllBytesAsync(fixture.SmallTestFile);

    bytes.Should().BeEquivalentTo(fixture.SmallTestFileExpectedBytes);
}

Convert Using MemoryStream

A second technique we can consider for converting a file to a byte array is to stream the file into a MemoryStream. While this results in additional memory pressure, if we already have an open FileStream object, we can write the stream directly into a MemoryStream and then return the result as an array:

public static byte[] ConvertUsingMemoryStream(string filePath)
{
    using var fs = File.OpenRead(filePath);
    using var ms = new MemoryStream(DefaultBufferSize);

    fs.CopyTo(ms);

    return ms.ToArray();
}

The code here is pretty straightforward. We open a FileStream for reading. Next, we initialize a new MemoryStream using a DefaultBufferSize constant (4096 for our example) as our initial stream capacity to reduce the number of reallocations in the MemoryStream while processing the file. Once our initial setup is done we simply copy the FileStream to our MemoryStream and then convert the MemoryStream to an array.

As with the File.ReadAllBytes() method, we can also perform a conversion using a MemoryStream in an asynchronous fashion:

public static async Task<byte[]> ConvertUsingMemoryStreamAsync(string filePath)
{
    await using var fs = File.OpenRead(filePath);
    await using var ms = new MemoryStream(DefaultBufferSize);

    await fs.CopyToAsync(ms);

    return ms.ToArray();
}

Convert a File Using a Rented Byte Array

Both of the previous techniques involved creating new arrays (or in the case of the MemoryStream potentially multiple new arrays). With the introduction of ArrayPool<T> to C#, we have some additional options that will allow us to load the contents of a file into memory without additional memory pressure. This proves especially useful when processing a large number of files and only needing the data in memory temporarily.

For a deeper dive into ArrayPool<T>, don’t miss our article: “Memory Optimization With ArrayPool in C#”. Now, let’s see how we can optimize our memory usage while loading files into a byte array.

Read File Into Rented Byte Array

Our first option is to rent an array, load the file data into it, and return the rented array:

public static (byte[] rentedArray, int length) ConvertToPooledArray(string filePath)
{
    var fileInfo = new FileInfo(filePath);
    ArgumentOutOfRangeException.ThrowIfGreaterThan(fileInfo.Length, Array.MaxLength, "File length");

    var length = (int)fileInfo.Length;
    var array = ArrayPool<byte>.Shared.Rent(length);
    var span = array.AsSpan(0, length);

    using var fs = fileInfo.OpenRead();
    fs.ReadExactly(span);

    return (array, length);
}

Here, we first load the file information and ensure the length is within the bounds of Array.MaxLength. Next, we cast the file length value to an int (which we know is safe because of our previous validation). Using this length we rent an array from the shared ArrayPool<byte>, and then take a Span<byte> of the appropriate length over the rented array. We take a Span over the array because arrays rented from the ArrayPool may be longer than the requested size.

Next, we open a FileStream for reading and read the file contents into the span. Finally, we return a ValueTuple containing the rented array and the length of the file. We return the length because a rented array may be larger than the requested size.

It is important to note here that to prevent memory leaks, the caller of this method must return the rented array to the ArrayPool:

ArrayPool<byte>.Shared.Return(rentedArray);

As with our previous methods, the conversion method can also be written asynchronously:

public static async Task<(byte[] rentedArray, int length)> ConvertToPooledArrayAsync(string filePath)
{
    var fileInfo = new FileInfo(filePath);
    ArgumentOutOfRangeException.ThrowIfGreaterThan(fileInfo.Length, Array.MaxLength, "File length");

    var length = (int) fileInfo.Length;
    var array = ArrayPool<byte>.Shared.Rent(length);
    var memory = array.AsMemory(0, length);

    await using var fs = fileInfo.OpenRead();
    await fs.ReadExactlyAsync(memory);

    return (array, length);
}

Note here that because we are in an asynchronous method, we must use a Memory<byte> rather than a Span<byte>. Span is a ref struct, and so it cannot be used in an asynchronous context.

Convert to Byte Array Using ArrayPoolBufferedWriter

Similar to our technique using MemoryStream, we can make use of ArrayPoolBufferedWriter which is available in the CommunityToolkit.HighPerformance package.

First, we need to add the package to our project:

dotnet add package CommunityToolkit.HighPerformance

ArrayPoolBufferedWriter behaves much like MemoryStream, but instead of allocating new arrays on the heap when the internal buffer is exceeded, it rents them from the ArrayPool. This helps to prevent memory pressure and heap fragmentation that can result from multiple temporary array allocations:

public static byte[] ConvertUsingPooledWriter(string filePath)
{
    using var writer = new ArrayPoolBufferWriter<byte>(DefaultBufferSize);
    using var stream = writer.AsStream();

    using var fs = File.OpenRead(filePath);
    fs.CopyTo(stream);

    return writer.WrittenSpan.ToArray();
}

Here we begin by creating our ArrayPoolBufferWriter with our DeafultBufferSize. We use an initial buffer size to help avoid reallocations. While in this case, we are not concerned about the memory allocations as a result of exceeding the buffer, we still want to avoid the copying required when the buffer is exceeded.

Next, we create a Stream from our buffered writer to enable stream operations on it. Following that we create a FileStream to read the contents of our file, and stream them into our buffered writer. Finally, we return the buffered contents as a new array.

As with our previous examples, we can also write this method as async:

public static async Task<byte[]> ConvertUsingPooledWriterAsync(string filePath)
{
    using var writer = new ArrayPoolBufferWriter<byte>(DefaultBufferSize);
    await using var stream = writer.AsStream();

    await using var fs = File.OpenRead(filePath);
    await fs.CopyToAsync(stream);

    return writer.WrittenSpan.ToArray();
}

Converting a Large File to a Byte Array in Chunks

As we mentioned at the beginning of our article, sometimes it is not possible to load the entire file into memory at one time. When a file’s length exceeds Array.MaxLength, we have to look for an alternative method to deal with the file contents. Here we present one approach making use of IAsyncEnumerable to return the large file in chunks:

public static async IAsyncEnumerable<byte[]> ConvertInChunksMemoryMapped(string filePath, int chunkSize,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    ArgumentOutOfRangeException.ThrowIfGreaterThan(chunkSize, Array.MaxLength);

    var rentedBuffer = ArrayPool<byte>.Shared.Rent(chunkSize);
    try
    {
        var memory = rentedBuffer.AsMemory(0, chunkSize);

        var fileLength = new FileInfo(filePath).Length;
        using var mm = MemoryMappedFile.CreateFromFile(filePath);
        await using var accessor = mm.CreateViewStream(0, fileLength);

        int bytesRead;
        while ((bytesRead = await accessor.ReadAsync(memory, cancellationToken)) != 0)
            yield return memory[..bytesRead].ToArray();
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(rentedBuffer);
    }
}

First, we begin by renting a buffer from the ArrayPool to use as an internal buffer for reading a chunk of data from the file. Next, we create a Memory<byte> over the rented buffer to use in our ReadAsync method.

Next, we create a MemoryMappedFile from our large file, which allows us to easily process the file in chunks. Once we have our MemoryMappedFile we create a MemoryMappedViewStream over it, allowing us to process it as a stream. We need to be careful to specify the length of the view stream, otherwise, the size of the view may be larger than the source file on disk. (For more information, refer to the Microsoft documentation for CreateViewStream).

Next, we loop through the stream, reading into our buffer. And lastly, we yield return a new array containing the current file chunk.

Note, for simplicity in the example we are returning a new array for each chunk. In a production environment, we would want to use a rented array or provide a way for the caller to provide the destination buffer.

We could also write a similar method using FileStream (see the GitHub repo for the full code listing) instead of MemoryMappedFile, but MemoryMappedFiles yield far better performance:

| Method                   | Mean     | Error    | StdDev   |
|------------------------- |---------:|---------:|---------:|
| ReadFileWithMemoryMapped |  7.046 s | 0.0606 s | 0.0567 s |
| ReadFileWithFileStream   | 13.685 s | 0.2484 s | 0.2324 s |

Benchmarking Our Methods

While each method has its different use cases, it is always a good idea to also consider performance when choosing a method. For our benchmarks, we are asynchronously reading a 17 kb file and computing its MD5 hashcode:

| Method                   | Mean     | Error   | StdDev   | Median   | Gen0    | Allocated |
|------------------------- |---------:|--------:|---------:|---------:|--------:|----------:|
| ReadFileWithMemoryStream | 254.7 us | 4.17 us |  6.11 us | 252.8 us | 30.7617 |  53.28 KB |
| ReadFileWithReadAllBytes | 257.9 us | 5.06 us |  4.48 us | 256.7 us | 11.7188 |  24.76 KB |
| ReadFileWithPooledWriter | 262.7 us | 5.64 us | 16.55 us | 257.3 us | 12.6953 |  25.25 KB |
| ReadFileWithPooledArray  | 296.5 us | 2.97 us |  2.64 us | 296.6 us |  0.4883 |   1.07 KB |
| ReadFileWithFileStream   | 349.7 us | 4.94 us |  3.85 us | 349.0 us | 14.1602 |  25.34 KB |
| ReadFileWithMemoryMapped | 368.6 us | 7.32 us | 15.29 us | 361.8 us | 12.2070 |  25.09 KB |

As far as the overall performance goes, there isn’t much difference between using a MemoryStream, calling File.ReadAllBytes and using the ArrayPoolBufferWriter. Only slightly slower is our method using a rented array, but when considering memory allocations, this method outshines them all. Interestingly, our last place method is the one involving memory mapped files. While this technique shines when processing very large files, for smaller files, it is better to just read the file directly into an array.

Conclusion

In this article, we explored several methods for converting a file into an array of bytes. We explored both synchronous and asynchronous techniques. Lastly, we explored a technique for reading a very large file as byte array chunks.

BIG LAUNCH WEEK DISCOUNT is ON!

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!

Convert a File to a Byte Array in C#

What is a Byte Array?

Some Initial Setup

Use ReadAllBytes to Convert a File

Convert Using MemoryStream

Convert a File Using a Rented Byte Array

Read File Into Rented Byte Array

Convert to Byte Array Using ArrayPoolBufferedWriter

Converting a Large File to a Byte Array in Chunks

Benchmarking Our Methods

Conclusion

Leave a reply Cancel reply

Microservices in .NET Promo

Ad 1

Ad 2

Ad 3

Ad 4