Retrying Failed HttpClient Requests in .NET Using Polly

In this article, we will explore the different approaches to retrying failed HttpClient requests using Polly. We will also discuss the advantages and disadvantages of each retry strategy, from simplest to most sophisticated.

To download the source code for this article, you can visit our GitHub repository.

Let’s start.

Reasons for HttpClient Request Failures

We like to think of the Internet as a reliable data transmission medium. Sadly, this is not the case. When developing applications that rely on fetching data from other sources, it’s important to keep this in mind to prepare for various error cases. Let’s briefly consider various causes of network errors and how to handle them.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!

Underlying Network Error

The lowest level source of failure is the network itself. Maybe the network cable of our server is broken, the ISP has stability issues, the DNS didn’t resolve, or the TCP handshake failed. Whatever the cause, we cannot fix underlying network issues from the application layer. What we can do is retry, and we’ll take a look at different retry strategies shortly.

Timeout

A timeout is technically also an underlying network error; however, it has a different meaning than, for example, a DNS failure. When a timeout occurs, we can assume that either the server is overloaded and wasn’t able to process our request promptly, or it’s completely down. This affects when and how often we retry the request since we do not want to put even more pressure on an overloaded server.

Client Errors

Client errors are application-level errors, indicated by the 400-499 HTTP status codes. They mean many different things from malformed requests to authentication failures and rate limiting. Generally, client errors shouldn’t be retried with the same parameters and environment, as they most commonly indicate a client-side coding or configuration error in the integration in that specific case. The 429 status code is an exception, indicating that the server has rate-limited us. Usually, the response body specifies when we’re permitted to send the next request, enabling us to time our retry accordingly.

Server Errors

Server errors are application-level errors as well and are indicated by the 500-599 HTTP status codes. They mean that the server encountered an error during the request processing that it couldn’t handle. For example, if the application is not running on the server, the default response will be 503. Mostly, they indicate errors that the server administrator should resolve. However, they are worth retrying because perhaps the server just restarted or another intermittent server-side issue occurred.

Now let’s take a look at Polly, a NuGet package that makes retrying requests much easier.

Polly

Polly is a library for .NET that simplifies transient failure and retry handling. Its goal is to make it easy to set up and manage retry strategies with the help of its fluent interface. We can also define different actions to take for different errors, customize the behavior of the Polly pipeline, and implement different retry strategies in just a few lines of code.

Let’s create a console application and install Polly:

dotnet add package Polly

With our initial project setup complete, let’s set up Dev Proxy, another useful tool for testing resiliency.

Dev Proxy

Dev Proxy is an API simulator developed by Microsoft. It intercepts HTTP messages and periodically returns an error for them instead of calling the real API. It is a very useful tool for testing our HTTP error-handling resiliency. Let’s use this tool to simulate API failures locally.

First, install it using winget:

winget install Microsoft.DevProxy --silent

Now let’s restart the terminal, so Windows can pick up the new environment variables set by winget, then start Dev Proxy by running the devproxy command in a terminal. Next, we’ll trust its self-signed certificate, note the installation folder in the output (it is in %LocalAppData%\Programs by default), and then shut Dev Proxy down with Ctrl+C.

From the Dev Proxy installation folder, let’s copy the devproxyrc.json and devproxy-errors.json configuration files into our project’s root directory. By default, Dev Proxy will fail 50% of the requests it intercepts, but to force complete failures, let’s change the rate property in devproxyrc.json to 100. Finally, let’s switch to our project’s root directory in the terminal and start Dev Proxy using the --config-file flag to specify our modified configuration file:

devproxy --config-file devproxyrc.json

Dev Proxy’s default configuration watches for URLs with the https://jsonplaceholder.typicode.com/* pattern, so we will use this URL to test our API calls. It will randomly choose an error response from the devproxy-errors.json and return it. Let’s remove all the responses except the 429 since this error will be enough for our testing, but of course we could add any kind of response here. Dev Proxy monitors the config files for changes and applies them while running so we don’t have to restart it.

With all the prerequisites set up, let’s discuss the various retry strategies, and how to implement them.

Retry Strategies Using Polly

A retry strategy is the collection of rules and techniques we employ to retry a failed operation. Each strategy has benefits and drawbacks, so choosing the appropriate one for our use case is important. Let’s take a look at each, starting with the simplest and progressing to the more complex.

Fixed Retry Strategy

The most straightforward retry strategy is the fixed retry. We specify the number and interval of retries. Let’s implement a fixed retry strategy with Polly in a new static RetryStrategy class, retrying a failed request 3 times and waiting for 1 second between each retry:

public static class RetryStrategy
{
    public static async Task ExecuteAsync()
    {
        var retryOptions = new RetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            OnRetry = args =>
            {
                Console.WriteLine($"OnRetry, Attempt: {args.AttemptNumber + 1}, Delay: {args.RetryDelay}");
                return default;
            }
        };

        var pipeline = new ResiliencePipelineBuilder()
            .AddRetry(retryOptions)
            .Build();

        await pipeline.ExecuteAsync(async cancellationToken =>
        {
            var response =
                await new HttpClient().GetAsync("https://jsonplaceholder.typicode.com/users", cancellationToken);
            Console.WriteLine($"HTTP Status Code: {response.StatusCode}");
            response.EnsureSuccessStatusCode();
        });
    }
}

First, we create an instance of the RetryStrategyOptions class and specify the MaxRetryAttempts to be 3 with a Delay of 1 second between each. We also specify an OnRetry callback that logs the retry attempt’s number to the console. Next, we build the pipeline by passing the retry strategy options. Finally, we make an HTTP call to the https://jsonplaceholder.typicode.com/users endpoint inside the pipeline’s ExecuteAsync() method, log the response’s status code and throw an exception if it is not a success. This is an important step since Polly won’t retry if we don’t throw an exception. It’s also crucial to note that creating an HttpClient using the new keyword is a bad practice, and we only use it here for testing purposes.

Let’s call our strategy in the Program class:

await RetryStrategy.ExecuteAsync();

Running the application produces the following output:

HTTP Status Code: TooManyRequests
OnRetry, Attempt: 1, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 2, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 3, Delay: 00:00:01
HTTP Status Code: TooManyRequests
Unhandled exception. System.Net.Http.HttpRequestException:
Response status code does not indicate success: 429 (Too Many Requests).

The first attempt is the initial call, then we can see two more attempts and a final exception, so four calls in total, the initial plus 3 retries.

The main benefit of the fixed retry strategy is its simplicity. In some cases this may be enough, however it has many drawbacks. The retry intervals and attempt count remain fixed, so we cannot adjust them based on the server’s response. If we run multiple instances of the application, the retry attempts will also synchronize, causing spikes in server load. Due to the fixed intervals, it’s also likely that we won’t provide enough time for the server to recover. Let’s look at another strategy that addresses some of these issues.

Exponential Backoff Retry Strategy

The exponential backoff strategy improves the fixed strategy by doubling the interval between each retry. This way, we can reduce the load on the server and help it recover, by having more delay between each retry. And the great thing with Polly is that we can implement it with just a single line of code:

var retryOptions = new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    BackoffType = DelayBackoffType.Exponential,
    Delay = TimeSpan.FromSeconds(1),
    OnRetry = args =>
    {
        Console.WriteLine($"OnRetry, Attempt: {args.AttemptNumber + 1}, Delay: {args.RetryDelay}");
        return default;
    }
};

We specify the BackoffType as Exponential, which leaves the initial delay value as 1 second, but then doubles it with each retry. Let’s run the application and inspect the output:

HTTP Status Code: TooManyRequests
OnRetry, Attempt: 1, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 2, Delay: 00:00:02
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 3, Delay: 00:00:04
HTTP Status Code: TooManyRequests
Unhandled exception. System.Net.Http.HttpRequestException:
Response status code does not indicate success: 429 (Too Many Requests).

So we successfully reduced the load on the server, possibly allowing it to recover faster. However, the exponential backoff strategy still has some drawbacks. It doesn’t take the server response into account when deciding when to retry. Moreover, it will create load peaks, since the retry attempts are still synchronized between clients. Let’s solve this synchronization issue.

Randomized Exponential Backoff Retry Strategy

The randomized exponential backoff strategy’s only difference from the exponential backoff is that it adds a small random number (jitter) to each delay. This helps to distribute the retry attempts and prevent load peaks. Polly can calculate this jitter in a cryptographically safe random way:

var retryOptions = new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    BackoffType = DelayBackoffType.Exponential,
    Delay = TimeSpan.FromSeconds(1),
    UseJitter = true,
    OnRetry = args =>
    {
        Console.WriteLine($"OnRetry, Attempt: {args.AttemptNumber + 1}, Delay: {args.RetryDelay}");
        return default;
    }
};

Let’s run the application and take a look at the output:

HTTP Status Code: TooManyRequests
OnRetry, Attempt: 1, Delay: 00:00:00.6116788
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 2, Delay: 00:00:01.8281117
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 3, Delay: 00:00:01.5587299
HTTP Status Code: TooManyRequests
Unhandled exception. System.Net.Http.HttpRequestException:
Response status code does not indicate success: 429 (Too Many Requests).

It’s not quite what we expected. The initial delay is not 1 second, and the third attempt’s delay is less than the second’s. This is because Polly uses the Decorrelated Jitter Backoff V2 algorithm for jitter calculation with exponential backoff. With a base delay of 1 second, the jitter might be significant enough that the first few values could end up being lower than the previous one. The algorithm generates a series of delays. Let’s imagine we run this algorithm a million times. Statistically, the median of each element will fall close to the 1x, 2x, 4x, etc points of the initial value. However, because the calculation of each element takes the previous element into account, it can fall significantly far from these values.

We’ve addressed all the problems with the fixed retry strategy except for one: the randomized exponential backoff still doesn’t consider the server’s response when determining when to retry. Let’s tackle this final issue.

Adaptive Retry Strategy

The adaptive retry strategy bases the retry delay calculation on some value from the server’s response. When we encounter rate-limiting, this becomes exceptionally useful because it prevents overloading the server during the rate-limited period. Servers usually respond with the Retry-After header that contains either a delay until we can retry or an exact date and time.

Let’s utilize Polly’s DelayGenerator property to implement our custom delay calculation. This property is a function, to which we pass a RetryDelayGeneratorArguments parameter. This parameter contains the thrown exception from our code, so first let’s create a RateLimitedException class to be able to pass information to our delay generator:

public class RateLimitedException(TimeSpan? retryAfter) : Exception
{
    public TimeSpan? RetryAfter { get; } = retryAfter;
}

We use C# 12’s primary constructors to make this class more concise. Next, let’s throw this exception when the server responds with the 429 HTTP status code:

await pipeline.ExecuteAsync(async cancellationToken =>
{
    var response = await new HttpClient().GetAsync("https://jsonplaceholder.typicode.com/users", cancellationToken);
    Console.WriteLine($"HTTP Status Code: {response.StatusCode}");
    if(response.StatusCode == HttpStatusCode.TooManyRequests)
        throw new RateLimitedException(response.Headers.RetryAfter?.Delta);
    response.EnsureSuccessStatusCode();
});

When the server returns no Retry-After header, we simply pass the null value to the exception. Now let’s update our RetryStrategyOptions with the delay generator function:

var retryOptions = new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    BackoffType = DelayBackoffType.Exponential,
    Delay = TimeSpan.FromSeconds(1),
    DelayGenerator = async arg =>
    {
        if(arg.Outcome.Exception is RateLimitedException rateLimitedException)
            return rateLimitedException.RetryAfter;
        return null;
    },
    UseJitter = true,
    OnRetry = args =>
    {
        Console.WriteLine($"OnRetry, Attempt: {args.AttemptNumber + 1}, Delay: {args.RetryDelay}");
        return default;
    }
};

When the delay generator function returns null, Polly will use the next delay value generated by the random exponential backoff strategy, as we configured earlier. So when the server doesn’t tell us when to retry, we simply retry based on our next best strategy. Let’s run the application and take a look at the output:

HTTP Status Code: TooManyRequests
OnRetry, Attempt: 1, Delay: 00:00:05
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 2, Delay: 00:00:05
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 3, Delay: 00:00:05
HTTP Status Code: TooManyRequests
Unhandled exception. RetryHttpExceptionsWithPolly.RateLimitedException:
Exception of type 'RetryHttpExceptionsWithPolly.RateLimitedException' was thrown.

In the output, we can see that the random exponential backoff strategy’s delays got overridden by the 5-second delay returned in the Retry-After header. In most of the cases, this is the final strategy we will need. It protects the server from constant overloads and load peaks and it takes its response into account. But let’s talk about one final strategy, that is useful when we need reliability or to fail fast.

Circuit Breaker Strategy

The circuit breaker strategy enables us to fail fast if we consider the server unhealthy. This is useful if we require a highly reliable service, especially when it’s currently overloaded. Imagine a personal finance application that shows account balances and makes bank transfers with an unreliable banking API. It is a better user experience to simply disable the whole banking functionality during times of instability than to allow the user to view their balance on the dashboard but then throw an exception when they try to make a transfer.

The circuit breaker strategy works by sampling the endpoint and calculating the success/failure ratio. If this ratio is below our given threshold it will shortcut the execution. All consecutive invocations will fail until a predefined duration. When the duration expires, we probe the server. If it is healthy, the calls will be forwarded to the server, and the sampling restarts.

Circuit Breaker Implementation

Fortunately, implementing the circuit breaker strategy with Polly is as easy as the other retry strategies. Let’s create a CircuitBreakerStrategy class, similar to the RetryStrategy class:

public static class CircuitBreakerStrategy
{
    public static async Task ExecuteAsync()
    {
        
    }
}

Let’s use the circuit breaker in conjunction with a retry strategy because the circuit breaker throws all exceptions and doesn’t retry. So let’s create a simple retry strategy and the pipeline test code:

var retryOptions = new RetryStrategyOptions
{
    MaxRetryAttempts = 20,
    Delay = TimeSpan.FromSeconds(1),
    OnRetry = async args => Console.WriteLine($"OnRetry, Attempt: {args.AttemptNumber + 1}, Delay: {args.RetryDelay}")
};

var pipeline = new ResiliencePipelineBuilder()
    .AddRetry(retryOptions)
    .Build();

await pipeline.ExecuteAsync(async cancellationToken =>
{
    var response = await new HttpClient().GetAsync("https://jsonplaceholder.typicode.com/users", cancellationToken);
    Console.WriteLine($"HTTP Status Code: {response.StatusCode}");
    response.EnsureSuccessStatusCode();
});

We simply retry every second, up to 20 retry attempts. Now, instantiate the CircuitBreakerStrategyOptions class and add it to the pipeline builder:

public static async Task ExecuteAsync()
{
    //omitted for brevity
    var circuitBreakerOptions = new CircuitBreakerStrategyOptions
    {
        FailureRatio = 0.1,
        MinimumThroughput = 5,
        SamplingDuration = TimeSpan.FromSeconds(5),
        BreakDuration = TimeSpan.FromSeconds(5),
        OnClosed = async _ => Console.WriteLine("Circuit Closed"),
        OnOpened = async _ => Console.WriteLine("Circuit Opened"),
        OnHalfOpened = async _ => Console.WriteLine("Circuit Half-Opened")
    };

    var pipeline = new ResiliencePipelineBuilder()
        .AddRetry(retryOptions)
        .AddCircuitBreaker(circuitBreakerOptions)
        .Build();
}

We define the failure ratio as 0.1, which means that if more than 10% of the requests fail, then break the circuit. The properties of minimum throughput and sampling duration are connected; they determine when we consider a sampling meaningful. If the pipeline executes at least 5 times within a 5-second timeframe, we consider the sampling successful. Then, we calculate the failure ratio and may break the circuit. However, if the pipeline executes only 4 times during the 5-second period, then the circuit will never break because we lack sufficient statistics to determine the service’s health. Let’s replace our RetryStrategy call in the Program class to the CircuitBreakerStrategy:

await CircuitBreakerStrategy.ExecuteAsync();

Finally, run the application:

HTTP Status Code: TooManyRequests
OnRetry, Attempt: 1, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 2, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 3, Delay: 00:00:01
HTTP Status Code: TooManyRequests
OnRetry, Attempt: 4, Delay: 00:00:01
HTTP Status Code: TooManyRequests
Circuit Opened
OnRetry, Attempt: 5, Delay: 00:00:01
OnRetry, Attempt: 6, Delay: 00:00:01
OnRetry, Attempt: 7, Delay: 00:00:01
OnRetry, Attempt: 8, Delay: 00:00:01
OnRetry, Attempt: 9, Delay: 00:00:01
Circuit Half-Opened
HTTP Status Code: TooManyRequests
Circuit Opened
//omitted for brevity

By default, the circuit remains closed (thus forwarding calls to the API), as we can see by the first 5 attempts. The fifth TooManyRequests exception breaks the circuit, so it will transition into the open state, and the next 5 retries won’t reach the API. Then the circuit opens halfway, allowing a single probe request to pass through. If that request is successful, the circuit will transition into the closed state and then will forward requests to the API and start sampling again.

However, in our case, the probe is still a failure. Thus, the circuit transitions into the open state, not letting through requests for the next 5 seconds.

Conclusion

In this article, we reviewed the different reasons for HTTP request failures and the various strategies we might employ to make our applications more resilient.

We started by backing off and slowing down consecutive retry attempts. Then we distributed our retry attempts a bit more, to mitigate load peak generation. We also took the server’s response into account when determining the next retry delay. Finally, we examined the circuit breaker strategy, which ensures that we only utilize healthy services and fail fast if they are deemed unhealthy.

It’s worth noting that we don’t need the most complex solution in every situation. When integrating with an external API, start with the simplest solution – no retries at all. If we see transient failures in production, we should implement a simple retry and progress toward more complex solutions as needed. Implementing a more complicated strategy might seem simple, especially with Polly, but the architectural complexity is still there, and we must keep that in mind.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!