In this article, we will learn about situations where we may need to convert a file into a byte array. Additionally, we will learn two ways to perform the conversion in C#.
If you want to learn how to convert a byte array to a file, check out our convert byte array to a file article.
So let’s start.
What is a Byte Array?
In C#, a byte array is an array of 8-bit unsigned integers (bytes). They are often used to represent more complex data structures, such as text, images, or audio data.
There are several use cases in which we want to convert a file to a byte array, some of them are:
- Loading file contents into memory for processing
- Network transmission of file data
- File format conversion
- File encryption
Generally, a byte array is declared using the byte[]
syntax:
byte[] byteArray = new byte[50];
This creates a byte array with 50 elements.
Some Initial Setup
Each of our examples in this article is exercised through the unit test project. The unit tests will create temporary files for the tests and delete them when all the tests have finished executing. For more about unit testing with xUnit and the IClassFixture<T>
be sure to check out our article “Differences Between NUnit, xUnit and MSTest“
When converting a file into a byte array it is important to remember that if the file size exceeds Array.MaxLength
, we will not be able to read all of the contents at once. Attempting to do so will result in an exception. In the last section of our article, we will review a technique for solving this problem by working on the file in chunks.
Use ReadAllBytes to Convert a File
One of the most straightforward methods for converting a file is the File.ReadAllBytes()
static method:
public void GivenFile_WhenConvertingUsingReadAllBytes_ThenReturnsCorrectContent() { var bytes = File.ReadAllBytes(fixture.SmallTestFile); bytes.Should().BeEquivalentTo(fixture.SmallTestFileExpectedBytes); }
This method opens the file, reads all of the contents into an array, and returns it.
If we are working with asynchronous code, we can use the File.ReadAllBytesAsync()
method to accomplish the same goal:
public async void GivenFile_WhenConvertingUsingReadAllBytesAsync_ThenReturnsCorrectContent() { var bytes = await File.ReadAllBytesAsync(fixture.SmallTestFile); bytes.Should().BeEquivalentTo(fixture.SmallTestFileExpectedBytes); }
Convert Using MemoryStream
A second technique we can consider for converting a file to a byte array is to stream the file into a MemoryStream
. While this results in additional memory pressure, if we already have an open FileStream
object, we can write the stream directly into a MemoryStream
and then return the result as an array:
public static byte[] ConvertUsingMemoryStream(string filePath) { using var fs = File.OpenRead(filePath); using var ms = new MemoryStream(DefaultBufferSize); fs.CopyTo(ms); return ms.ToArray(); }
The code here is pretty straightforward. We open a FileStream
for reading. Next, we initialize a new MemoryStream
using a DefaultBufferSize
constant (4096 for our example) as our initial stream capacity to reduce the number of reallocations in the MemoryStream
while processing the file. Once our initial setup is done we simply copy the FileStream
to our MemoryStream
and then convert the MemoryStream
to an array.
As with the File.ReadAllBytes()
method, we can also perform a conversion using a MemoryStream
in an asynchronous fashion:
public static async Task<byte[]> ConvertUsingMemoryStreamAsync(string filePath) { await using var fs = File.OpenRead(filePath); await using var ms = new MemoryStream(DefaultBufferSize); await fs.CopyToAsync(ms); return ms.ToArray(); }
Convert a File Using a Rented Byte Array
Both of the previous techniques involved creating new arrays (or in the case of the MemoryStream
potentially multiple new arrays). With the introduction of ArrayPool<T>
to C#, we have some additional options that will allow us to load the contents of a file into memory without additional memory pressure. This proves especially useful when processing a large number of files and only needing the data in memory temporarily.
For a deeper dive into ArrayPool<T>
, don’t miss our article: “Memory Optimization With ArrayPool in C#”. Now, let’s see how we can optimize our memory usage while loading files into a byte array.
Read File Into Rented Byte Array
Our first option is to rent an array, load the file data into it, and return the rented array:
public static (byte[] rentedArray, int length) ConvertToPooledArray(string filePath) { var fileInfo = new FileInfo(filePath); ArgumentOutOfRangeException.ThrowIfGreaterThan(fileInfo.Length, Array.MaxLength, "File length"); var length = (int)fileInfo.Length; var array = ArrayPool<byte>.Shared.Rent(length); var span = array.AsSpan(0, length); using var fs = fileInfo.OpenRead(); fs.ReadExactly(span); return (array, length); }
Here, we first load the file information and ensure the length is within the bounds of Array.MaxLength
. Next, we cast the file length value to an int (which we know is safe because of our previous validation). Using this length we rent an array from the shared ArrayPool<byte>
, and then take a Span<byte>
of the appropriate length over the rented array. We take a Span
over the array because arrays rented from the ArrayPool
may be longer than the requested size.
Next, we open a FileStream
for reading and read the file contents into the span. Finally, we return a ValueTuple
containing the rented array and the length of the file. We return the length because a rented array may be larger than the requested size.
It is important to note here that to prevent memory leaks, the caller of this method must return the rented array to the ArrayPool
:
ArrayPool<byte>.Shared.Return(rentedArray);
As with our previous methods, the conversion method can also be written asynchronously:
public static async Task<(byte[] rentedArray, int length)> ConvertToPooledArrayAsync(string filePath) { var fileInfo = new FileInfo(filePath); ArgumentOutOfRangeException.ThrowIfGreaterThan(fileInfo.Length, Array.MaxLength, "File length"); var length = (int) fileInfo.Length; var array = ArrayPool<byte>.Shared.Rent(length); var memory = array.AsMemory(0, length); await using var fs = fileInfo.OpenRead(); await fs.ReadExactlyAsync(memory); return (array, length); }
Note here that because we are in an asynchronous method, we must use a Memory<byte>
rather than a Span<byte>
. Span
is a ref struct, and so it cannot be used in an asynchronous context.
Convert to Byte Array Using ArrayPoolBufferedWriter
Similar to our technique using MemoryStream
, we can make use of ArrayPoolBufferedWriter
which is available in the CommunityToolkit.HighPerformance package.
First, we need to add the package to our project:
dotnet add package CommunityToolkit.HighPerformance
ArrayPoolBufferedWriter
behaves much like MemoryStream
, but instead of allocating new arrays on the heap when the internal buffer is exceeded, it rents them from the ArrayPool
. This helps to prevent memory pressure and heap fragmentation that can result from multiple temporary array allocations:
public static byte[] ConvertUsingPooledWriter(string filePath) { using var writer = new ArrayPoolBufferWriter<byte>(DefaultBufferSize); using var stream = writer.AsStream(); using var fs = File.OpenRead(filePath); fs.CopyTo(stream); return writer.WrittenSpan.ToArray(); }
Here we begin by creating our ArrayPoolBufferWriter
with our DeafultBufferSize
. We use an initial buffer size to help avoid reallocations. While in this case, we are not concerned about the memory allocations as a result of exceeding the buffer, we still want to avoid the copying required when the buffer is exceeded.
Next, we create a Stream
from our buffered writer to enable stream operations on it. Following that we create a FileStream
to read the contents of our file, and stream them into our buffered writer. Finally, we return the buffered contents as a new array.
As with our previous examples, we can also write this method as async
:
public static async Task<byte[]> ConvertUsingPooledWriterAsync(string filePath) { using var writer = new ArrayPoolBufferWriter<byte>(DefaultBufferSize); await using var stream = writer.AsStream(); await using var fs = File.OpenRead(filePath); await fs.CopyToAsync(stream); return writer.WrittenSpan.ToArray(); }
Converting a Large File to a Byte Array in Chunks
As we mentioned at the beginning of our article, sometimes it is not possible to load the entire file into memory at one time. When a file’s length exceeds Array.MaxLength
, we have to look for an alternative method to deal with the file contents. Here we present one approach making use of IAsyncEnumerable to return the large file in chunks:
public static async IAsyncEnumerable<byte[]> ConvertInChunksMemoryMapped(string filePath, int chunkSize, [EnumeratorCancellation] CancellationToken cancellationToken = default) { ArgumentOutOfRangeException.ThrowIfGreaterThan(chunkSize, Array.MaxLength); var rentedBuffer = ArrayPool<byte>.Shared.Rent(chunkSize); try { var memory = rentedBuffer.AsMemory(0, chunkSize); var fileLength = new FileInfo(filePath).Length; using var mm = MemoryMappedFile.CreateFromFile(filePath); await using var accessor = mm.CreateViewStream(0, fileLength); int bytesRead; while ((bytesRead = await accessor.ReadAsync(memory, cancellationToken)) != 0) yield return memory[..bytesRead].ToArray(); } finally { ArrayPool<byte>.Shared.Return(rentedBuffer); } }
First, we begin by renting a buffer from the ArrayPool
to use as an internal buffer for reading a chunk of data from the file. Next, we create a Memory<byte>
over the rented buffer to use in our ReadAsync
method.
Next, we create a MemoryMappedFile
from our large file, which allows us to easily process the file in chunks. Once we have our MemoryMappedFile
we create a MemoryMappedViewStream
over it, allowing us to process it as a stream. We need to be careful to specify the length of the view stream, otherwise, the size of the view may be larger than the source file on disk. (For more information, refer to the Microsoft documentation for CreateViewStream).
Next, we loop through the stream, reading into our buffer. And lastly, we yield return a new array containing the current file chunk.
Note, for simplicity in the example we are returning a new array for each chunk. In a production environment, we would want to use a rented array or provide a way for the caller to provide the destination buffer.
We could also write a similar method using FileStream
(see the GitHub repo for the full code listing) instead of MemoryMappedFile
, but MemoryMappedFiles
yield far better performance:
| Method | Mean | Error | StdDev | |------------------------- |---------:|---------:|---------:| | ReadFileWithMemoryMapped | 7.046 s | 0.0606 s | 0.0567 s | | ReadFileWithFileStream | 13.685 s | 0.2484 s | 0.2324 s |
Benchmarking Our Methods
While each method has its different use cases, it is always a good idea to also consider performance when choosing a method. For our benchmarks, we are asynchronously reading a 17 kb file and computing its MD5 hashcode:
| Method | Mean | Error | StdDev | Median | Gen0 | Allocated | |------------------------- |---------:|--------:|---------:|---------:|--------:|----------:| | ReadFileWithMemoryStream | 254.7 us | 4.17 us | 6.11 us | 252.8 us | 30.7617 | 53.28 KB | | ReadFileWithReadAllBytes | 257.9 us | 5.06 us | 4.48 us | 256.7 us | 11.7188 | 24.76 KB | | ReadFileWithPooledWriter | 262.7 us | 5.64 us | 16.55 us | 257.3 us | 12.6953 | 25.25 KB | | ReadFileWithPooledArray | 296.5 us | 2.97 us | 2.64 us | 296.6 us | 0.4883 | 1.07 KB | | ReadFileWithFileStream | 349.7 us | 4.94 us | 3.85 us | 349.0 us | 14.1602 | 25.34 KB | | ReadFileWithMemoryMapped | 368.6 us | 7.32 us | 15.29 us | 361.8 us | 12.2070 | 25.09 KB |
As far as the overall performance goes, there isn’t much difference between using a MemoryStream
, calling File.ReadAllBytes
and using the ArrayPoolBufferWriter
. Only slightly slower is our method using a rented array, but when considering memory allocations, this method outshines them all. Interestingly, our last place method is the one involving memory mapped files. While this technique shines when processing very large files, for smaller files, it is better to just read the file directly into an array.
Conclusion
In this article, we explored several methods for converting a file into an array of bytes. We explored both synchronous and asynchronous techniques. Lastly, we explored a technique for reading a very large file as byte array chunks.