In this article, we’re going to talk about Rate Limiting in ASP.NET Core and explore some ways of implementing it.

To download the source code for this article, you can visit our GitHub repository.

Let’s dive into it.

What Is Rate Limiting?

APIs expose certain resources and functionalities to a client who consumes those APIs. For example, a restaurant’s website integrates an API of a table reservation service to make reservations online.

Rate Limiting is the process of restricting the number of requests for a resource within a specific time window.

A service provider offering an API for consumers will have limitations on requests made within a specified window of time. For instance, each unique user/IP Address/Client key will have a limitation on the number of requests to an API endpoint.

Why We Use Rate Limiting?

Public APIs use rate-limiting for commercial purposes to generate revenue. A common business model is to pay a certain subscription amount for leveraging the API. So, they can only make so many API calls before paying more for an upgraded plan.

Rate Limiting helps protect against malicious bot attacks. For example, a hacker can use bots to make repeated calls to an API endpoint. Hence, rendering the service unavailable for anyone else. This is known as the Denial of Service (DoS) attack.

Another purpose of rate limiting is to regulate traffic to the API according to infrastructure availability. Such usage is more relevant to the cloud-based API services that utilize a “pay as you go” IaaS strategy with cloud providers.

Demo Web API Application

Let’s use a Web API application in the eCommerce domain that performs simple CRUD operations on a list of products. The controller contains the corresponding action methods:

[ProducesResponseType(typeof(IEnumerable<Product>), StatusCodes.Status200OK)]
public IActionResult GetAllProducts() 
    return Ok(_repo.GetAll());

[ProducesResponseType(typeof(Product), StatusCodes.Status200OK)]
public IActionResult GetProduct(Guid id)
   var product = _repo.GetById(id);
   return product is not null ? Ok(product) : NotFound();

The Web API app exposes two endpoints that get a list of products and single product details respectively. Let’s apply a limit on the number of requests on one of the endpoints.  For example, the endpoint returning a list of all available products to the client. This will be most likely a popular endpoint in the context of request traffic.

Mind you that you should never implement a method that returns all the results in your production application. It might result in performance degradation or even worse, crashes in some cases. We use it for demo purposes only. You can check our article on paging in ASP.NET Core Web API to learn how to do it properly.

The ProductRepository is responsible for interaction with a persistent store for the products:

public class ProductCatalogRepository : IProductCatalogRepository
    private readonly Dictionary<Guid, Product> _products = new();
    private Random _rnd = new Random();
    public ProductCatalogRepository()
    public List<Product> GetAll() 
       return _products.Values.ToList(); 
    public Product GetById(Guid id)
       return _products[id];


For simplicity, let’s use an in-memory dictionary as a persistent store for the products. However, the repositories in an enterprise application will not use an in-memory dictionary. In such cases, they will interact with a relational or non-relational data source.

Applying Rate Limiting Using a Custom Middleware

ASP.NET Core does not support Rate Limiting out of the box in .NET 6 (the one we are using for this article) and below. However, it is relatively easy to plug in a custom solution that implements this strategy. ASP.NET Core framework provides HTTP middleware extensibility options for this purpose.

Based on the requirement, the API may apply throttling to all endpoints or certain specific endpoints. The best way to achieve this is using a decorator.

The Decorator

Let’s use an attribute to decorate the endpoint which we want to throttle:

public class LimitRequests : Attribute
    public int TimeWindow { get; set; }
    public int MaxRequests { get; set; }

This attribute applies only to methods. The two properties in the attribute indicate the max requests allowed within a specific time window. The attribute approach gives us the flexibility to apply different rate-limiting configurations for different endpoints within the same API.

Let’s apply the LimitRequests decorator to the /products endpoint and configure it to allow a maximum of 2 requests for a window of 5 seconds:

[ProducesResponseType(typeof(IEnumerable<Product>), StatusCodes.Status200OK)]
[LimitRequests(MaxRequests = 2, TimeWindow = 5)]
public IActionResult GetAllProducts() 
  return Ok(_repo.GetAll());

Now the third request within the window of 5 seconds won’t return a successful response.

The Middleware

The RateLimitingMiddleware custom middleware contains the logic for rate limiting:

public async Task InvokeAsync(HttpContext context)
     var endpoint = context.GetEndpoint();
     var decorator = endpoint?.Metadata.GetMetadata<LimitRequests>();

     if (decorator is null)
         await _next(context);

     var key = GenerateClientKey(context);
     var clientStatistics = await GetClientStatisticsByKey(key);

     if (clientStatistics != null && 
            DateTime.UtcNow < clientStatistics.LastSuccessfulResponseTime.AddSeconds(decorator.TimeWindow) && 
            clientStatistics.NumberOfRequestsCompletedSuccessfully == rateLimitingDecorator.MaxRequests)
         context.Response.StatusCode = (int)HttpStatusCode.TooManyRequests;

     await UpdateClientStatisticsStorage(key, rateLimitingDecorator.MaxRequests);
     await _next(context);

First, let’s check if the requested endpoint contains the LimitRequests decorator. So, if there is no decorator, the request passes to the next middleware in the pipeline.

If there is a decorator at the endpoint, let’s generate a unique key. This key is the combination of the endpoint path and the IP Address of the client:

private static string GenerateClientKey(HttpContext context) 
    => $"{context.Request.Path}_{context.Connection.RemoteIpAddress}";

We can limit requests within a specified window of time based on the IP address, user id, or client key. Here, we’ve chosen the IP addresses as the client identifier. So, we have the flexibility to choose the strategy for rate limit identifier.

Now, let’s use this key to get an instance of the ClientStatistics class from a distributed cache:

private async Task<ClientStatistics> GetClientStatisticsByKey(string key)
  return await _cache.GetCacheValueAsync<ClientStatistics>(key);

public class ClientStatistics
    public DateTime LastSuccessfulResponseTime { get; set; }
    public int NumberOfRequestsCompletedSuccessfully { get; set; }

The ClientStatistics instance is a record of the number of times the specific client was served a response and the time of the last successful response. Here, the middleware throttles the current request based on this data.

For a load-balanced API, ideally, we would store the client statistics data in a distributed cache like Redis or Memcached. However, for simplicity let’s use an in-memory cache here:


Finally, let’s use the client statistics to check if the current request has crossed the maximum request limit within the time window for the endpoint. In such a scenario, the client receives a status code of 429. Then, the code updates the cache with the current client statistics for a successful request.

The client receives a list of products with a status code of 200 if the number of requests does not violate the request limit for the endpoint:

API Response

Now, let’s send a request to get all the products:

curl -X 'GET' \
  'https://localhost:7090/products' \
  -H 'accept: application/json'

If our API works as expected, we get the list of products we have defined in the ProductRepository:

    "id": "17dcb468-a2a5-480c-aad8-58646b5cf537",
    "name": "Sample Product 1",
    "price": 46,
    "rating": 1
    "id": "87b97a2b-6b51-49aa-9a74-5f4583f6182b",
    "name": "Sample Product 2",
    "price": 39,
    "rating": 3

However, for the same request, if the number of requests is more than the maximum request limit, we get a status code of 429:

Code  Details
429   Error: response status is 429

Applying Rate Limiting Using the AspNetCoreRateLimit NuGet Package

The AspNetCoreRateLimit package by Stefan Prodan and Cristi Pufu is an excellent open-source NuGet package that caters to most of the rate-limiting requirements in ASP.NET Core Web APIs.

This package contains an IpRateLimitMiddleware and a ClientRateLimitMiddleware to support the IP Address and client-key throttling strategies respectively. A variety of configuration options make this implementation very flexible.

Let’s continue using the IP address rate-limiting strategy in this example as well. After installing the package, let’s register the relevant services in the Program class:

builder.Services.AddSingleton<IIpPolicyStore, MemoryCacheIpPolicyStore>();
builder.Services.AddSingleton<IRateLimitCounterStore, MemoryCacheRateLimitCounterStore>();
builder.Services.AddSingleton<IRateLimitConfiguration, RateLimitConfiguration>();
builder.Services.AddSingleton<IProcessingStrategy, AsyncKeyLockProcessingStrategy>();

After that, let’s register the in-memory cache as the request counter store for simplicity. This can easily be replaced with a distributed cache like Redis by installing the AspNetCoreRateLimit.Redis package. The service registration code also needs some changes to use Redis:

var redisOptions = ConfigurationOptions.Parse(builder.Configuration["ConnectionStrings:Redis"]);
builder.Services.AddSingleton<IConnectionMultiplexer>(provider => ConnectionMultiplexer.Connect(redisOptions));

Let’s add the configuration for the IP rate limit in the application’s appsettings.json file:

"IpRateLimiting": {
  "EnableEndpointRateLimiting": true,
  "StackBlockedRequests": false,
  "RealIPHeader": "X-Real-IP",
  "ClientIdHeader": "X-ClientId",
  "HttpStatusCode": 429,
  "GeneralRules": [
      "Endpoint": "GET:/products",
      "Period": "5s",
      "Limit": 2

The EnableEndpointRateLimiting is set as true to ensure that throttling is applied to specific endpoints rather than all endpoints.

Let’s set the rate-limiting rules in the GeneralRules section. In this case, the rule specifies that for the endpoint /products with an HTTP verb GET, allow only 2 requests in a time window of 5 seconds.

The format for the Endpoint setting is flexible as it supports pattern matching. For example, *:/products/* will apply the rate-limiting rule to all the endpoints irrespective of the HTTP verb and having the term “products” in its route.

Response Headers

Let’s send a request to the API endpoint that does not violate rate-limiting rules:

curl -X 'GET' \
  'https://localhost:7229/products' \
  -H 'accept: application/json'

The API endpoint returns a list of products and three response headers related to rate-limiting:

Response headers: 

content-type: application/json; charset=utf-8 
date: Fri,21 Jan 2022 20:03:42 GMT
server: Kestrel 
x-rate-limit-limit: 5s 
x-rate-limit-remaining: 1 
x-rate-limit-reset: 2022-01-21T20:03:47.3445992Z

The first two response headers (x-rate-limit-limit and x-rate-limit-remaining) show the rate limit time window and the number of requests the API endpoint allows respectively. The third header displays the time stamp for resetting the throttling rules.

In case of a failure response where a status code of 429 is returned (rate-limiting rules are violated), the headers include a key-value pair retry-after: 4, which instructs the client to retry after 4 seconds to avoid the violation of rate-limiting rules:

Server Response: 

Code  Details 
429   Error: response status is 429 

Response body: 

API calls quota exceeded! maximum admitted 2 per 5s. 

Response headers: 

content-type: text/plain   
date: Fri,21 Jan 2022 20:07:01 GMT   
retry-after: 4   
server: Kestrel 

For further details about all the available configurations for IP rate-limiting as well as client-key rate-limiting strategy, please refer to the GitHub link for the AspNetCoreRateLimit package.

Rate Limiting vs Throttling

This article uses the terms “Rate-Limiting” and “Throttling” interchangeably. However, it is important to point out the differences. The term Rate-Limiting refers to the broader concept of restricting the request traffic to an API endpoint at any point in time. Throttling is a particular process of applying rate-limiting to an API endpoint.

There are other ways an API endpoint can apply rate-limiting. One such way is the use of Request Queues. This process queues the incoming requests. It then serves them to the API endpoint at a rate that the API can process gracefully.

However, in Throttling, the API endpoint presents the consumer with a status code to indicate the restriction to send any more requests within the specific time window. The client application can then retry after the time window passes. In the examples used in this article, only the throttling process has been showcased.


The AspNetCoreRateLimit package is sufficient to cater to most of the rate-limiting business requirements. It is very flexible and provides a host of configuration options. It is also easy to plug into the Web API without writing a single line of code.

However, there may be some custom requirement associated with rate-limiting that is not covered by the package. In that case, the best way to implement it is using custom middleware. The custom middleware approach gives the developers complete control over the rate-limiting mechanism that is applied on the endpoints.