In this article, we look into various ways to calculate the size of a directory.

Applications dealing with file management or storage inevitably need to know how much space a directory takes up. Calculating the directory size helps with disk space management as we can use it to strategize data distribution, identify large directories, and generate reports for users. 

To download the source code for this article, you can visit our GitHub repository.

Let’s start.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

How to Calculate the Size of a Directory

As with most problems, we find multiple approaches to tackling this one. That said, any method for calculating the size of a directory needs to traverse each file within the directory. As we review each file, we retrieve its total bytes and update the total sum of the directory. Additionally, we want to check for subdirectories and include their files too. 

Note that the size of a directory is not the same as its size on the disk. The size we are about to calculate refers to the actual amount of data the directory contains. The size on the disk, on the other hand, refers to the allocated space the specified directory takes on the disk. It may be equal to or larger than the total directory size.

Now, we can move on to how to calculate the size.

Let’s create a class named DirectorySizeCalculator. In it, we define three different methods to get the size of a directory. All of them should yield the same value for a given directory. The Directory and DirectoryInfo classes provide us with helpful properties and methods to achieve our goals.

Determine Directory Size Via Recursion

A recursive method is a method that calls itself. To learn more about recursion, check out our article. Adopting recursion in our code improves readability. 

We call our method GetSizeWithRecursion():

public static long GetSizeWithRecursion(DirectoryInfo directory)
{
    if (directory == null || !directory.Exists)
    {
        throw new DirectoryNotFoundException("Directory does not exist.");
    }

    long size = 0;

    try
    {
        size += directory.GetFiles().Sum(file => file.Length);

        size += directory.GetDirectories().Sum(GetSizeWithRecursion);
    }
    catch (UnauthorizedAccessException)
    {
        Console.WriteLine($"We do not have access to {directory}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"We encountered an error processing {directory}: ");
        Console.WriteLine($"{ex.Message}");
    }

    return size;
}

Our method takes in an instance of the DirectoryInfo class representing the root directory. It then initializes size to keep track of the total size. Using the GetFiles(), we retrieve all the files in the directory. By calling Sum() we iterate through each file and add up their Length, updating size. Next, we check for subdirectories with GetDirectories(). If any exist, we call Sum() and pass in our method name only since its signature matches the Func delegate parameter of this extension method of Sum(this IEnumerable<TSource> source, Func<TSource, long>).

Recursive solutions are concise and readable. Yet, they may negatively affect performance due to the method call overhead for each subdirectory. Let’s move forward and consider other methods to determine the size of a directory.

An Iterative Method to Compute the Size

C# has four types of loops or iteration methods – while, do-while, for, and foreach loops. Let’s define another method, GetSizeByIteration() which utilizes a stack and a while loop to calculate the directory size:

public static long GetSizeByIteration(string directoryPath)
{       
    long size = 0;
    var stack = new Stack<string>();

    stack.Push(directoryPath);

    while (stack.Count > 0)
    {
        string directory = stack.Pop();

        try
        {
            var files = Directory.GetFiles(directory);

            foreach (var file in files)
            {
                size += new FileInfo(file).Length;
            }

            var subDirectories = Directory.GetDirectories(directory);

            foreach (var subDirectory in subDirectories)
            {
                stack.Push(subDirectory);
            }
        }
        catch (UnauthorizedAccessException)
        {
            Console.WriteLine($"We do not have access to {directory}");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"We encountered an error processing {directory}: ");
            Console.WriteLine($"{ ex.Message}");
        }
    }

    return size;
}

In this method, the parameter is the directoryPath as opposed to an instance of DirectoryInfo in the recursive method. Here, we discuss when to choose between the Directory and DirectoryInfo classes. Next, we initialize a string stack and push the path onto it. Then we have the while loop running as long as the stack is not empty. 

The iterative method avoids the potential method call overhead of a recursive method. However, as nesting levels increase, the traversal logic could become more complex and potentially impact performance.

Finally, let’s see how using multiple concurrent threads could help handle these issues and make things faster.

Calculating Directory Size In Parallel

Parallel processing involves using multiple threads to execute tasks simultaneously. We discuss the details of how this works here.

Now, we design our last method GetSizeByParallelProcessing():

public static long GetSizeByParallelProcessing(DirectoryInfo directory, 
SearchOption searchOption = SearchOption.AllDirectories)
{
    if (directory == null || !directory.Exists)
    {
        throw new DirectoryNotFoundException("Directory does not exist.");
    }

    long size = 0;

    try
    {
        Parallel.ForEach(directory.EnumerateFiles("*", searchOption), fileInfo =>
        {
            try
            {
                Interlocked.Add(ref size, fileInfo.Length);
            }
            catch (UnauthorizedAccessException)
            {
                Console.WriteLine($"Unauthorized access to {fileInfo.FullName}");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing {fileInfo.FullName}: ");
                Console.WriteLine($"{ex.Message}");
            }
        });
    }
    catch (UnauthorizedAccessException)
    {
        Console.WriteLine($"Unauthorized access to {directory.FullName}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing {directory.FullName}: ");
        Console.WriteLine($"{ex.Message}");
    }

    return size;
}

This method includes a SearchOption  parameter that we pass into EnumerateFiles(). It is an enum that lets us indicate whether we want to focus only on the top directory or consider subdirectories in our calculation. In other words, it helps us manage the depth of our traversal. SearchOption.AllDirectories includes all subfolders while SearchOption.TopDirectoryOnly focuses on the files in the specified directory.  

Parallel.ForEach() enables the concurrent processing of files in the directory (and subdirectory). We use Interlocked.Add() when multiple threads are updating the same variable, in this case, size, simultaneously. The Interlocked class enables us to perform certain operations atomically, preventing race conditions.

EnumerateFiles() and EnumerateDirectories() are more efficient than the GetFiles() and GetDirectories() methods because they allow us to process our targets without loading them all into memory upfront. The enumerate methods have overloads that include a searchPattern and SearchOption parameters. The search pattern allows us to restrict what files/directories we wish to enumerate. While SearchOption, as we have already seen, is an enum we use to specify the depth of our search traversal.

Breaking down tasks into smaller pieces using parallel programming enhances performance and makes the system more responsive. But, note that it might require more CPU usage, especially when processing multiple subdirectories concurrently.

Conclusion

We wrote three methods that calculate the size of a directory. The advantages of each method come with certain tradeoffs. Therefore, when deciding whether to use recursive, iterative, or parallel processing methods, there are a few key factors to consider. These include the overall size of the directory structure, resource constraints, performance requirements, and any other unique needs of our application. By accounting for all these variables, we can decide on the best-suited method to calculate our directory size. 

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!