AWS Lambda: What It Is and When to Use It

Definition

AWS Lambda is a serverless, event-driven compute service that runs your code in response to events and automatically manages the underlying compute resources. You upload a function (as a ZIP file or container image), configure a trigger (an API Gateway request, an S3 upload, a DynamoDB stream, a scheduled EventBridge rule, or a direct SDK invocation), and AWS runs it on demand — scaling from zero to thousands of concurrent executions in seconds and charging only for the time your code is actually running.

How It Works

Lambda follows an event-driven model. An event source invokes a function, Lambda spins up an isolated execution environment (a lightweight microVM built on the Firecracker hypervisor), loads your code, and runs the handler:

  1. Event sources — synchronous (API Gateway, Application Load Balancer, Function URLs, direct Invoke), asynchronous (S3, SNS, EventBridge), or poll-based (SQS, Kinesis, DynamoDB Streams, Kafka, MSK).
  2. Execution environment — a sandboxed microVM with your chosen runtime (Node.js, Python, Java, .NET, Go, Ruby, or a custom runtime via the Runtime API or a container image up to 10 GB).
  3. Handler function — the entry point AWS calls with an event payload and a context object describing the invocation. It returns a result or throws, and Lambda captures the log to CloudWatch Logs.
  4. Scaling — for each concurrent request, Lambda reuses a warm environment if one is available (the "warm path"); otherwise it provisions a fresh one (the cold start). Environments are kept alive for some minutes between invocations.

Functions can join a VPC to reach private resources (RDS, ElastiCache, or on-premises through Direct Connect). Versions and Aliases allow immutable deployments and weighted-traffic rollouts.

Key Features and Limits

  • Memory: 128 MB to 10,240 MB in 1 MB increments. vCPU scales proportionally with memory — more memory = more CPU.
  • Timeout: 1 second to 15 minutes per invocation.
  • Ephemeral storage (/tmp): 512 MB up to 10,240 MB, configurable per function.
  • Deployment package: 50 MB zipped / 250 MB unzipped via direct upload; up to 10 GB via container image.
  • Default concurrency: 1,000 concurrent executions per account per Region (soft quota, raisable).
  • Environment variables: up to 4 KB per function.
  • Payload size: 6 MB for synchronous invocation, 256 KB for asynchronous.
  • Layers: up to 5 Layers per function; each Layer can be up to 250 MB unzipped. Great for sharing dependencies across functions.
  • Cold starts: 100 ms – several seconds depending on runtime, package size, and VPC configuration. Mitigated with Provisioned Concurrency or Lambda SnapStart (Java, Python 3.12+, .NET 8+).
  • Response streaming: stream large responses back to HTTP callers in chunks (up to 20 MB).
  • Function URLs: built-in HTTPS endpoints without API Gateway.
  • Lambda extensions: sidecar-like processes that share the execution environment — used by observability vendors (Datadog, New Relic) to flush telemetry.
  • Function state via Durable Execution (via Step Functions): for long-running orchestration up to 1 year.

Common Use Cases

  1. REST and GraphQL API backends — Lambda behind API Gateway (REST or HTTP API) is the canonical serverless web backend.
  2. S3 event processing — resize images on upload, scan PDFs with Textract, index documents into OpenSearch.
  3. Data pipeline steps — Kinesis Data Streams or MSK trigger Lambda for ETL and enrichment; the processed records flow into S3, DynamoDB, or Redshift.
  4. Scheduled jobs and automation — EventBridge Rules invoke Lambda on a cron schedule for maintenance tasks (expire RDS snapshots, rotate secrets, cleanup CloudWatch Log Groups).
  5. Glue code between services — SNS topic fan-out, EventBridge filtering, cross-account event forwarding, CloudTrail response automation.
  6. IoT and mobile backends — AWS IoT Rules or AWS AppSync resolvers delegate business logic to Lambda.
  7. Event-driven integration with generative AI — orchestrate Bedrock Agents, call Amazon Q, or run RAG retrieval on S3 + OpenSearch.

Pricing Model

Lambda pricing has three components:

  1. Request pricing — per 1 million requests. Free Tier includes 1 million requests/month forever.
  2. Duration pricing — priced per GB-second (memory × duration). Free Tier includes 400,000 GB-seconds/month. The x86 rate is higher than Graviton (Arm); switching a compatible function to arm64 cuts duration cost by ~20%.
  3. Provisioned Concurrency — you pre-warm N concurrent environments at a lower per-invocation rate and pay a GB-second cost for the standby time.

Extras: data transfer out, ephemeral storage beyond 512 MB, and function URLs / API Gateway each have their own bill lines. Use the AWS Pricing Calculator to model spiky workloads — Lambda's "pay for what you use" model is usually cheaper than EC2 below a certain sustained utilization, and more expensive above it.

Pros and Cons

Pros

  • No servers to manage — patching, scaling, and high availability are AWS's job.
  • Scale to zero — you pay nothing when there is no traffic.
  • Fine-grained scaling — each request gets an execution environment, up to the concurrency limit.
  • Native integrations with 200+ AWS services through EventBridge and direct triggers.
  • Per-function IAM roles enforce least privilege.

Cons

  • 15-minute hard limit — long-running workloads need Step Functions, Fargate, or EC2.
  • Cold starts — still painful for latency-sensitive APIs in Java/.NET without SnapStart or Provisioned Concurrency.
  • Deployment package size limits make large ML models cumbersome without container images.
  • Statelessness — all state must live in external services (DynamoDB, ElastiCache, S3).
  • Cost surprises at very high, steady request volumes — Fargate or EC2 may be cheaper past a certain break-even point.

Comparison with Alternatives

| | Lambda | Fargate | EC2 | | --- | --- | --- | --- | | Unit of work | Function (one invocation) | Container task (long-running) | Virtual machine | | Max runtime | 15 min | Unlimited | Unlimited | | Scale-to-zero | Yes | Partial (tasks count = 0) | No (without tricks) | | Cold start | Sub-second to several seconds | 30–90 seconds | Minutes | | Pricing | Per request + GB-second | Per vCPU-second + GB-second | Per instance-second | | Best for | Event-driven glue, short APIs | Long-running stateless containers | Legacy apps, HPC, long-running services |

Compared with Google Cloud Functions and Azure Functions, Lambda has the deepest ecosystem of triggers and integrations, but all three offer similar core mechanics.

Exam Relevance

Lambda is core material across several AWS certifications:

  • Cloud Practitioner (CLF-C02) — what serverless means, why it matters.
  • Solutions Architect Associate (SAA-C03) — picking Lambda vs ECS vs Fargate, API Gateway + Lambda patterns, scaling limits, VPC impact on cold starts.
  • Developer Associate (DVA-C02) — heavy coverage: Lambda Layers, SnapStart, concurrency (reserved vs provisioned), versioning, aliases, traffic shifting, environment variables, SAM templates, error handling with Dead Letter Queues / on-failure destinations.
  • DevOps Professional (DOP-C02) — CodeDeploy Canary/Linear deploys of Lambda, Step Functions orchestration, observability via X-Ray and Powertools.

Key exam gotchas: synchronous retry semantics differ from asynchronous (async retries twice, then goes to DLQ); reserved concurrency also acts as a throttle (capping maximum concurrency), while provisioned concurrency pre-initializes environments but does not itself limit scale.

Frequently Asked Questions

Q: What is a Lambda cold start and how can I reduce it?

A: A cold start is the latency incurred when Lambda has to initialize a fresh execution environment before running your handler — typically 100 ms to several seconds, worst for Java and .NET. You can reduce cold starts by using Lambda SnapStart (for Java, Python 3.12+, .NET 8+), enabling Provisioned Concurrency, avoiding large dependencies, and keeping the function out of a VPC when possible (or using VPC-aware Hyperplane ENIs which are now the default).

Q: How is Lambda billed?

A: Lambda bills you per request and per GB-second of execution duration (memory allocated × time, rounded to the millisecond). The Free Tier includes 1 million requests and 400,000 GB-seconds per month indefinitely. Provisioned Concurrency and the Arm (Graviton) architecture are separate rates; Arm is roughly 20% cheaper for supported runtimes.

Q: When should I choose Lambda instead of ECS or EC2?

A: Choose Lambda when your workload is event-driven, bursty, runs for less than 15 minutes per invocation, and benefits from scale-to-zero. Choose ECS/Fargate for long-running stateless containers or workloads that exceed Lambda's limits. Choose EC2 for workloads that need full OS control, specific hardware, or are cheaper to run on reserved capacity at high sustained utilization.


This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official AWS Lambda documentation before making production decisions.

Published: 4/16/2026

This article is for informational purposes only. AWS services, pricing, and features change frequently — always verify details against the official AWS documentation before making production decisions.

More in Compute