EC2 Spot Instances: Maximum Savings for Fault-Tolerant Workloads
Definition
EC2 Spot Instances let you use spare EC2 capacity at up to 90% off On-Demand prices. The trade-off: AWS can reclaim (interrupt) your Spot Instance with a 2-minute warning whenever it needs the capacity back. Spot Instances are ideal for workloads that are fault-tolerant, stateless, or flexible in timing — batch processing, CI/CD pipelines, data analytics, containerized microservices, and machine learning training.
Spot is not a separate instance type — it is a purchasing option for standard EC2 instances. You get the same hardware, networking, and performance as On-Demand; only the pricing model and availability guarantee differ.
How It Works
Spot pricing fluctuates based on supply and demand in each capacity pool — a combination of instance type, AZ, and OS. Unlike the old bidding model (retired in 2017), you now simply request Spot Instances and pay the current Spot price, which is typically 60–90% below On-Demand.
Interruption behavior: when AWS needs capacity back, it sends a 2-minute warning via the instance metadata service and (optionally) an EventBridge event. You can configure the interruption behavior to terminate, stop, or hibernate the instance.
Requesting Spot capacity:
- Spot Instance request: single request for one or more instances of a specific type.
- Spot Fleet: request a target capacity across multiple instance types and AZs. Spot Fleet automatically launches the cheapest or most diversified combination. Supports On-Demand instances as a baseline within the fleet.
- EC2 Fleet: similar to Spot Fleet but also manages On-Demand and Reserved capacity in a single API call. The newer and more flexible option.
- EC2 Auto Scaling with mixed instances: an Auto Scaling Group can combine On-Demand and Spot instances with capacity-optimized or lowest-price allocation strategies.
Key Features and Limits
- Savings: typically 60–90% off On-Demand, varying by instance type and Region.
- 2-minute interruption warning: delivered via instance metadata (
http://169.254.169.254/latest/meta-data/spot/instance-action) and EventBridge. - Interruption behaviors: terminate (default), stop (for EBS-backed), or hibernate.
- Capacity pools: diversify across 10+ instance types and 3+ AZs to minimize interruption risk.
- Spot placement score: API that helps you find Regions and AZs with the highest Spot capacity for your requirements.
- Spot Fleet strategies:
lowestPrice(cheapest pools first),capacityOptimized(pools with most available capacity — recommended),diversified(even spread across pools). - No capacity guarantee: Spot can be interrupted at any time. Not suitable for workloads that cannot tolerate interruption.
- Max Spot price: you can set a max price (defaults to On-Demand price). If the Spot price exceeds your max, your instance is interrupted.
- Persistent vs one-time requests: persistent requests automatically relaunch interrupted instances when capacity returns.
Common Use Cases
- Batch processing — MapReduce, ETL pipelines, video transcoding, genomics processing. Work is checkpointed; interrupted jobs resume from the last checkpoint.
- CI/CD pipelines — build and test jobs are short-lived and easily retriable. Jenkins, GitHub Actions, and GitLab CI all support Spot-backed runners.
- Containerized workloads — EKS and ECS on Spot with Karpenter or Cluster Autoscaler. Kubernetes handles pod rescheduling on interruption.
- Machine learning training — SageMaker managed Spot training saves up to 90%. Checkpointing to S3 protects against interruption.
- Stateless web workers — application servers behind an ALB in an Auto Scaling Group with mixed instances (On-Demand baseline + Spot burst).
- Big data analytics — EMR on Spot for Spark/Hadoop workloads. Task nodes on Spot, core/master on On-Demand.
- High-performance computing (HPC) — tightly coupled or embarrassingly parallel workloads that can checkpoint.
Pricing Model
- Spot price: fluctuates per capacity pool but is typically 60–90% below On-Demand. Prices have been remarkably stable since AWS moved away from the auction model.
- Per-second billing: same as On-Demand — charged per second with a 60-second minimum.
- If AWS interrupts you: you are not charged for the partial hour in which the interruption occurs (if the interruption is AWS-initiated).
- If you stop/terminate: you are charged for the seconds used.
- Data transfer and EBS: standard rates apply — Spot pricing only covers the instance compute cost.
- Spot Fleet/EC2 Fleet: no additional charge for the fleet management; you pay only for the instances launched.
Combine with Savings Plans for maximum savings: Savings Plans cover your steady baseline (On-Demand at discounted rates), and Spot covers burst capacity. This hybrid approach can reduce total compute cost by 60–80%.
Pros and Cons
Pros
- Deepest discount available — up to 90% off On-Demand.
- Same instance performance as On-Demand (same hardware, same networking).
- Spot Fleet and EC2 Fleet automate diversification and capacity management.
- 2-minute warning enables graceful shutdown and checkpointing.
- Per-second billing with no charge for AWS-initiated partial-hour interruptions.
- Integrates natively with Auto Scaling, ECS, EKS, EMR, and SageMaker.
Cons
- Instances can be interrupted at any time — not suitable for stateful or latency-critical workloads.
- 2-minute warning may not be enough for complex graceful shutdown procedures.
- Requires architectural patterns for fault tolerance (checkpointing, idempotency, statelessness).
- Capacity is not guaranteed — popular instance types in popular AZs may have frequent interruptions.
- Monitoring and managing Spot interruptions adds operational complexity.
- Cannot be used for databases, primary DNS, or any single-instance critical service.
Comparison with Alternatives
| | Spot Instances | On-Demand | Savings Plans | Reserved Instances | | --- | --- | --- | --- | --- | | Discount | Up to 90% | 0% | Up to 72% | Up to 72% | | Commitment | None | None | 1 or 3 years ($/hour) | 1 or 3 years (instance config) | | Interruptible | Yes (2-min warning) | No | No | No | | Capacity guarantee | No | Yes | No (billing only) | Yes (Zonal RIs) | | Best for | Fault-tolerant batch, CI/CD, containers | Unpredictable or short-term workloads | Steady baseline compute | RDS, ElastiCache, Redshift | | Can combine with | On-Demand + Savings Plans baseline | Spot for burst | Spot for burst + On-Demand for spikes | Savings Plans |
Optimal cost architecture: Savings Plans for baseline → Spot for fault-tolerant capacity → On-Demand for the remainder.
Exam Relevance
- Cloud Practitioner (CLF-C02) — know that Spot offers the deepest discount but instances can be interrupted. Know it is for fault-tolerant workloads.
- Solutions Architect Associate (SAA-C03) — frequent scenario: "batch processing needs the lowest cost" → Spot Instances. Know Spot Fleet, capacity-optimized allocation, diversification across instance types. Know that Spot is not for databases or single-instance workloads.
- Solutions Architect Professional (SAP-C02) — advanced: EC2 Fleet with mixed instances, Spot interruption handling with EventBridge + Lambda, combining Spot with Savings Plans and On-Demand in a cost-optimized architecture.
Exam trap: if the question says "the workload cannot tolerate any interruption," Spot is never the answer — even if cost is the primary concern.
Frequently Asked Questions
Q: What happens when a Spot Instance is interrupted?
A: AWS sends a 2-minute warning via the instance metadata service and an EventBridge event. Depending on your configuration, the instance is terminated, stopped, or hibernated. Best practice: use the 2-minute window to checkpoint work to S3 or SQS, deregister from the load balancer, and complete in-flight requests. If AWS interrupts the instance, you are not charged for the partial hour.
Q: How do I minimize Spot interruptions?
A: Diversify across at least 10 instance types and 3+ Availability Zones using Spot Fleet or EC2 Fleet with the capacityOptimized allocation strategy. This strategy places instances in pools with the most available capacity, reducing interruption probability. Avoid relying on a single popular instance type in a single AZ. Use the Spot Placement Score API to find optimal Regions and AZs.
Q: Can I use Spot Instances for databases or production web servers?
A: Not for databases — interruptions cause data loss or corruption in self-managed databases. For production web servers, you can use Spot as part of a mixed-instance Auto Scaling Group where On-Demand or Savings Plans instances handle the baseline and Spot instances handle burst traffic. The ALB health check automatically routes traffic away from interrupted instances. Never run a single-instance critical service on Spot.
This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official EC2 Spot Instances documentation before making production decisions.