Lambda Cold Start: What It Is and How to Mitigate It
Definition
A Lambda cold start is the additional latency incurred when AWS Lambda has to create a fresh execution environment to serve a request, instead of reusing a warm one. During a cold start, Lambda allocates a Firecracker microVM, downloads and unpacks your deployment package (or container image), initializes your language runtime, runs any init code outside the handler, and — for VPC functions — attaches an Elastic Network Interface. Cold starts typically add 100 milliseconds to several seconds of latency on top of normal invocation time, depending on runtime, package size, memory, and VPC settings. Understanding and mitigating cold starts is one of the most common performance-tuning concerns in serverless architectures.
How It Works
Lambda maintains a pool of execution environments per function version. When an invocation arrives:
- Warm path — if an idle environment already exists, Lambda reuses it. Only your handler code runs; latency is typically single-digit to low-tens of milliseconds.
- Cold path — if none is available (first invocation, concurrency burst, scale-up, code or config change), Lambda builds a new one.
A cold start includes these phases:
- Download code — pull the ZIP package or container image layers to the worker host. Cached layers speed later cold starts.
- Start microVM — Firecracker microVM boots.
- Start runtime — the Node, Python, Java, .NET, Go, or Ruby runtime initializes.
- INIT phase — your code outside the handler runs (imports, DB client construction, SDK client creation). You are billed for this time at your memory allocation, up to 10 seconds.
- ENI attach (VPC only) — since Hyperplane ENIs (2019), Lambda pre-creates ENIs per subnet+security-group combination and shares them across functions, so the per-invocation ENI attach cost collapsed from ~10 seconds to a few milliseconds.
Once the environment is warm, it is kept around (typically for minutes) and reused for subsequent invocations. Environments are destroyed when idle too long, when concurrency scales down, or when a new code/config version is deployed.
Key Features and Limits
- Cold-start duration varies by runtime. Typical rough ranges:
- Node.js, Python: 100–400 ms.
- Go, Ruby: 100–500 ms.
- Java: 500 ms – several seconds (JVM startup + classloading).
- .NET: similar to Java without NativeAOT; NativeAOT compiles AOT and reduces to Node/Python-like numbers.
- Custom runtimes / container images: depends on image size and caching.
- Factors that increase cold-start latency:
- Deployment package size (ZIP or container image).
- Number and size of dependencies.
- Heavyweight init work (SDK client creation, DB connections, framework bootstrap).
- VPC attachment — tiny with Hyperplane ENIs, but the first ENI per SG/subnet can still be slow.
- Low memory settings (vCPU scales linearly with memory).
- Lambda SnapStart — for supported runtimes (Java 11/17/21, Python 3.12+, .NET 8+), Lambda snapshots the initialized execution environment's memory and disk when you publish a version, then restores from the snapshot on cold start. Typical Java cold starts drop from seconds to sub-second.
- Provisioned Concurrency — you pre-warm a number of environments that stay permanently initialized, so those invocations skip cold starts entirely. Billed per GB-second of provisioned time.
- Lambda Layers — shared code across functions; they are merged at INIT and counted in package size. Keep layers small.
- Arm64 / Graviton2 — usually starts slightly faster than x86_64 and costs ~20% less.
- Container images up to 10 GB — larger than ZIP's 250 MB unzipped limit but with comparable cold-start cost if you leverage AWS-maintained base images and image caching.
Common Use Cases (for cold-start tuning)
- Latency-sensitive APIs — a Lambda behind API Gateway or ALB that must respond in <500 ms. Use SnapStart, Provisioned Concurrency, or a fast runtime.
- Synchronous chat / voice backends — Lex, Connect, and WebSocket handlers where a 3-second cold start is a dealbreaker.
- Event-driven workers — S3, SQS, and EventBridge consumers where cold starts are less visible but still affect throughput on scale-up.
- Scheduled cron functions — rarely invoked functions that are almost always cold; SnapStart or a keep-warm EventBridge rule helps.
- CDN edge compute — Lambda@Edge and CloudFront Functions cold-start behave differently; CloudFront Functions are built for sub-millisecond cold starts.
Pricing Model
Cold starts are billed like any other Lambda invocation time: at your memory tier's per-GB-second rate, including the INIT phase (up to 10 seconds). Mitigations have their own costs:
- SnapStart — charged per-GB-second for the snapshot caching and per-restore fee for each cold-started invocation that uses a snapshot. Often cheaper than Provisioned Concurrency for low-traffic Java functions.
- Provisioned Concurrency — per-GB-second while provisioned, whether invoked or not, plus a discounted per-request rate when invoked. Cost-effective when your cold-start rate is high and steady.
- Lambda Layers, container images, and Graviton — no direct price, but reducing package size cuts cold-start indirectly.
Compute Savings Plans apply to Provisioned Concurrency, reducing cost by up to 17%.
Pros and Cons (of mitigations)
Pros
- Sub-second Java cold starts via SnapStart make serverless viable for latency-sensitive JVM apps.
- Provisioned Concurrency eliminates cold starts entirely for the provisioned fleet.
- Graviton + NativeAOT + smaller packages give consistent single-digit-hundred-ms cold starts for Node/Python/.NET AOT/Go.
Cons
- Provisioned Concurrency costs money while idle — defeats Lambda's pay-per-use appeal if overused.
- SnapStart has a few runtime caveats: ephemeral data in the snapshot (e.g., random seeds, DB connections) can be stale at restore; you must handle via runtime hooks.
- Keep-warm pings are brittle and mask the real problem.
Comparison with Alternatives
| | Lambda (on-demand) | Lambda + SnapStart | Lambda + Provisioned Concurrency | Fargate / ECS | | --- | --- | --- | --- | --- | | Typical cold start | 100 ms – several s | <1 s for Java/Python/.NET AOT | 0 ms (pre-warmed) | 30–90 s per task launch | | Idle cost | $0 | Low (snapshot cache) | Per-GB-second provisioned | Per-task-second always | | Best for | Low-to-medium traffic, async | Heavy runtimes with bursty traffic | Latency-critical APIs with predictable floor | Steady containers, long-running jobs |
If you need always-on millisecond latency, Provisioned Concurrency or Fargate/ECS with ALB is usually the right answer over hoping on-demand Lambda warms in time.
Exam Relevance
- Developer Associate (DVA-C02) — know cold-start vs warm path; INIT phase; how to move heavy work outside the handler and cache it; when to use SnapStart vs Provisioned Concurrency.
- Solutions Architect Associate (SAA-C03) — choose the right serverless architecture for latency-sensitive workloads; recognize that VPC attach is no longer a major cold-start cost due to Hyperplane ENIs.
- DevOps Professional (DOP-C02) — roll out Provisioned Concurrency with Application Auto Scaling; align deployment strategy to avoid cold-start surges during rollout (aliases + weighted traffic).
Common exam trap: old study guides say "put Lambda in a VPC to cause huge cold starts." Since the Hyperplane ENI change (2019), VPC cold-start overhead is negligible. The modern cold-start culprits are heavy runtimes (Java/.NET) and big packages — and the modern fix is SnapStart or Provisioned Concurrency.
Frequently Asked Questions
Q: Which Lambda runtime has the shortest cold start?
A: In general, Node.js, Python, and Go have the shortest cold starts (roughly 100–400 ms at modest memory sizes). Ruby is similar. Java and .NET are the slowest without mitigation, often 500 ms to several seconds due to JVM/CLR startup and classloading. Lambda SnapStart (available for Java, Python, and .NET) brings Java cold starts into the sub-second range by restoring a pre-initialized snapshot. Arm64 (Graviton2) is usually slightly faster than x86_64 and ~20% cheaper.
Q: Should I use SnapStart or Provisioned Concurrency?
A: Use SnapStart when you have a heavy runtime (Java, Python, .NET) and want to cut cold starts without paying for always-on capacity — it's nearly free to enable and significantly reduces cold-start p99 latency. Use Provisioned Concurrency when you need zero cold starts for a known baseline of traffic, regardless of runtime, and you can justify the per-GB-second cost. Many production apps combine both: Provisioned Concurrency for the steady base, SnapStart for the bursty tail.
Q: How do I reduce cold-start latency without SnapStart or Provisioned Concurrency?
A: First, shrink the package — remove unused dependencies, tree-shake, and move large assets out of the deployment bundle. Second, defer heavy init — lazy-load SDK clients and DB connections, but keep truly reusable clients in module scope so they persist across warm invocations. Third, increase memory — vCPU scales with memory, so 1 GB usually starts twice as fast as 256 MB. Fourth, choose Graviton (arm64). Fifth, use AWS-provided base container images which are prewarmed on Lambda hosts. Finally, avoid cold-start-causing anti-patterns like frequent code redeploys of latency-critical functions during business hours.
This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official AWS Lambda documentation before making production decisions.