API Gateway Throttling: What It Is and When to Use It

Definition

Amazon API Gateway throttling is a feature that limits the rate at which users can make API requests, protecting backend services from being overwhelmed by too much traffic. It works by enforcing limits on the number of requests per second, helping to ensure the availability and reliability of your APIs for all users.

How It Works

API Gateway uses the token bucket algorithm to manage request rates. Imagine a bucket that holds a specific number of tokens. Each incoming API request takes one token from the bucket. The bucket is refilled with tokens at a steady, configurable rate (the rate limit). The total number of tokens the bucket can hold is the burst limit, which allows for a temporary spike in traffic beyond the steady-state rate. If a request arrives when the bucket is empty, the request is throttled, and API Gateway returns a 429 Too Many Requests error to the client.

Throttling rules are applied in a specific order of precedence, from most specific to most general:

  1. Per-Client, Per-Method Limits (Usage Plans): The most granular level, these limits are defined in a Usage Plan and associated with specific API keys. This allows you to set different rate limits for different customers (e.g., a 'Gold' tier gets a higher limit than a 'Silver' tier).
  2. Per-Method Limits (Stage Configuration): You can set default throttling limits for a specific method (e.g., GET /items) within an API stage, which apply to all users who are not part of a specific usage plan.
  3. Stage-Level Limits: You can set an overall limit for an entire API stage (e.g., 'prod' or 'dev'). This limit is shared across all methods within that stage unless a more specific per-method limit is defined.
  4. Account-Level Limits: AWS imposes a default, account-wide throttling limit per Region across all of your APIs to protect the service itself. This is a safety net that can be increased by contacting AWS Support.

If a request is allowed by a more specific rule (like a usage plan), it must still pass the more general rules (like the account-level limit).

Key Features and Limits

  • Algorithm: Uses the token bucket algorithm for both a steady-state rate and a burst capacity.
  • Default Account Quota (as of 2026): The default account-level quota is 10,000 requests per second (RPS) with a burst capacity of 5,000 requests across all APIs in a Region. Some newer regions may have lower default limits. This limit can be increased upon request.
  • Configurability: Throttling can be configured at multiple levels: per-method, per-stage, and per-client via Usage Plans.
  • Usage Plans: For REST APIs, Usage Plans allow you to bundle API stages and configure throttling limits and quotas (maximum requests over a period like a day or month) for specific clients using API keys.
  • Error Response: When a limit is exceeded, API Gateway returns a 429 Too Many Requests HTTP status code.
  • Best-Effort Basis: Throttling and quotas are applied on a best-effort basis and should be considered targets rather than guaranteed ceilings, as the distributed nature of API Gateway can sometimes allow brief overruns.

Common Use Cases

  • Protecting Backend Services: Prevents downstream services like AWS Lambda, Amazon EC2, or on-premises databases from being overloaded by sudden traffic spikes, ensuring stability.
  • Ensuring Fair Usage: In multi-tenant applications, you can use Usage Plans and API keys to prevent a single "noisy neighbor" from consuming all available resources and degrading the service for other users.
  • Monetizing APIs: By creating different Usage Plans with tiered rate limits and quotas (e.g., Free, Basic, Premium), you can offer different levels of service as product offerings.
  • Improving System Reliability: Throttling is a key practice for building resilient, well-architected systems by regulating inbound traffic to match what your system can handle.
  • Managing Costs: By limiting the number of requests, you can control the costs associated with backend service invocations (e.g., Lambda function executions) and API Gateway's own request-based pricing.

Pricing Model

API Gateway throttling is a built-in feature and does not have a separate, direct cost. However, its configuration directly impacts your overall AWS bill. You are charged for the number of API calls you receive and the amount of data transferred out.

By preventing excessive requests, throttling helps you control the costs of both API Gateway itself and the backend services it invokes. The API Gateway free tier includes one million API calls per month for the first 12 months for new accounts.

For detailed pricing, always refer to the official Amazon API Gateway Pricing page.

Pros and Cons

Pros:

  • Managed Protection: Provides a fully managed, serverless way to protect backend systems without needing to implement and maintain custom rate-limiting logic.
  • Granular Control: Offers multiple levels of control, from broad account-level limits to fine-grained per-client, per-method rules.
  • Improved Stability and Availability: Prevents traffic spikes from causing cascading failures in downstream services.
  • Enables Business Logic: Allows for the creation of tiered API products with different service levels for different customers.

Cons:

  • Configuration Complexity: The hierarchy of rules (Usage Plan, method, stage, account) can be complex to manage and troubleshoot.
  • REST API Focus for Granularity: The most granular controls using Usage Plans and API keys are primarily a feature of REST APIs, while HTTP APIs have more limited, route-level throttling options.
  • Potential to Block Legitimate Traffic: If not configured carefully, overly aggressive throttling can reject valid user requests, impacting user experience.
  • Best-Effort Nature: Since limits are not a guaranteed hard ceiling, some overage can occur, which might be an issue for extremely sensitive backends.

Comparison with Alternatives

  • AWS WAF Rate-Based Rules: AWS WAF (Web Application Firewall) can be used in front of API Gateway to provide rate limiting based on source IP addresses and other request characteristics (headers, URI paths) over configurable time windows (e.g., 1, 2, 5, or 10 minutes). WAF is excellent for mitigating web-layer DDoS attacks and blocking malicious traffic patterns, while API Gateway throttling is better suited for managing application-level usage, ensuring fair use among clients, and protecting backend capacity.
  • Custom Logic in Backend (e.g., AWS Lambda): You could implement your own rate-limiting logic within your Lambda function or application code, using a service like Amazon ElastiCache or Amazon DynamoDB to track request counts. This offers maximum flexibility but comes with significantly higher operational overhead, complexity, and cost compared to using the built-in API Gateway feature.

Exam Relevance

API Gateway throttling is a key topic in several AWS certification exams, particularly those focused on development and architecture.

  • AWS Certified Developer - Associate (DVA-C02): Expect questions on how to use Usage Plans, API keys, and throttling limits to protect backend services and manage API access for different clients.
  • AWS Certified Solutions Architect - Associate (SAA-C03): Questions often focus on choosing the right strategy to protect a backend from being overwhelmed, where API Gateway throttling is a primary solution. You should understand the different levels of throttling and when to use them.
  • AWS Certified Solutions Architect - Professional (SAP-C02): Professionals are expected to have a deep understanding of the throttling hierarchy, how it interacts with other services like AWS WAF, and how to design resilient, multi-tenant architectures using these features.

Frequently Asked Questions

Q: What is the difference between throttling and quotas in API Gateway?

A: Throttling limits the rate of requests, measured in requests per second (RPS), to protect against short-term bursts of traffic. Quotas, configured within a Usage Plan, limit the total number of requests a client can make over a longer period, such as a day, week, or month. Throttling is for availability, while quotas are for managing overall usage and billing.

Q: How should my application handle a 429 'Too Many Requests' error?

A: When a client receives a 429 error, it should not immediately retry the request. The best practice is to implement a retry mechanism with an exponential backoff and jitter algorithm. This means waiting for a progressively longer, randomized period before retrying, which prevents a large number of clients from retrying simultaneously and worsening the traffic storm.

Q: My API is getting 429 errors, but my configured limits don't seem to be reached. What could be the cause?

A: This can happen for several reasons due to the throttling hierarchy. The request might be blocked by a more general limit you are not monitoring, such as the overall account-level RPS limit for the region. It could also be that a backend integration, like an AWS Lambda function, is reaching its own concurrency limit, which can also result in a 429 error being returned through API Gateway.


This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official AWS documentation before making production decisions.

Published: 5/16/2026 / Updated: 5/16/2026

This article is for informational purposes only. AWS services, pricing, and features change frequently — always verify details against the official AWS documentation before making production decisions.

More in Networking