AWS X-Ray: What It Is and When to Use It

Definition

AWS X-Ray is AWS's distributed tracing service. It collects trace data as requests flow through a distributed application — across Lambda functions, ECS / EKS containers, EC2 instances, API Gateway, DynamoDB, SQS, SNS, and more — and renders a service map plus per-trace timelines so you can see exactly where time is spent and where errors originate. X-Ray is the observability layer that answers "why is this request slow?" or "which microservice returned the 500?" in a way that CloudWatch Metrics and Logs alone cannot. It's compatible with OpenTelemetry via the AWS Distro for OpenTelemetry (ADOT), so teams can export traces to X-Ray alongside third-party backends.

How It Works

Traces, segments, subsegments

A trace represents the full path of one request through your system, identified by a unique trace ID (propagated in an X-Amzn-Trace-Id header). Each service the request touches emits a segment — a record of that service's work, with start/end time, HTTP metadata, errors, and faults. Within a segment, subsegments record downstream calls (DynamoDB query, external HTTP request, custom code block).

Each segment can carry:

  • Annotations — indexed key–value pairs (up to 50 per trace), queryable in the console filter expression (e.g., annotation.customerId = "42").
  • Metadata — arbitrary JSON attached to a segment, not indexed but visible per trace.
  • Errors / faults / throttles — classified status for the segment.

How traces are produced

  • Lambda, API Gateway, App Runner, Step Functions — native X-Ray integration. Turn it on with a single flag; Lambda automatically emits segments for its invocation and any AWS SDK calls made from the function.
  • ECS / EKS / EC2 — run the X-Ray daemon (a small UDP listener) as a sidecar or host process. Application code uses the X-Ray SDK (or OpenTelemetry with the ADOT collector) to send segments to the daemon, which batches and ships them to the X-Ray API.
  • SDKs / OpenTelemetry — the X-Ray SDK (Java, Node.js, Python, Go, .NET, Ruby) instruments AWS SDK calls, HTTP clients (axios, requests, http), and popular ORMs (Sequelize, SQLAlchemy). The AWS Distro for OpenTelemetry (ADOT) is the modern recommendation for new applications, since OpenTelemetry traces can simultaneously feed X-Ray and vendor tools (Datadog, Honeycomb).

Service map

The X-Ray console's service map visualizes services as nodes and calls as edges, color-coded by error rate and labeled with latency percentiles. At a glance you can see "API Gateway → checkout-fn → payment-svc (8% fault) → Stripe (p99 1.8s)".

Sampling rules

X-Ray doesn't trace every request by default — it samples. The default rule records the first request each second plus 5% of subsequent requests per service. You can define custom sampling rules per service, URL pattern, or HTTP method to record high-value traces (checkout, payments) at 100% and low-value ones (health checks) at near zero.

X-Ray Insights

An ML-based feature that automatically detects anomalies — latency or fault-rate spikes — on your service graph and opens a grouped insight with root-cause analysis. Saves manual dashboard-watching.

Trace analytics

Aggregate queries over traces: "p99 latency for POST /checkout by customer tier yesterday." Uses the annotations you set at instrumentation time.

CloudWatch integration

X-Ray traces are linked to CloudWatch Logs when you enable "send trace ID to logs" — one click from a trace to the log lines of that specific request. CloudWatch Application Signals (newer) layers APM-style SLO tracking on top of X-Ray traces.

Key Features and Limits

  • Trace retention: 30 days.
  • Max trace size: 500 KB per trace, 64 KB per segment document.
  • Annotations: up to 50 per trace, strings / numbers / booleans, indexed and filterable.
  • Sampling rules: reservoir (N per second) + fixed percentage; per service/host/URL pattern.
  • Encryption: at rest with default or customer-managed KMS key.
  • Cross-account: traces can be shared across accounts via resource policies for centralized observability accounts.
  • OpenTelemetry: ADOT collector → X-Ray exporter; W3C Trace Context propagation (in addition to the original X-Amzn-Trace-Id).

Common Use Cases

  1. Microservice latency analysis — "where does the 2s p99 come from in the checkout path?"
  2. Root-cause analysis on incidents — one trace ID ties together API Gateway + Lambda + DynamoDB + SQS + downstream HTTP.
  3. Serverless debugging — X-Ray is practically mandatory for non-trivial Lambda + API Gateway + Step Functions architectures.
  4. Canary and blue/green validation — compare traces and error rates between versions.
  5. Per-customer performance — annotations (customerId, tenantId) surface slow-customer patterns.
  6. Correlation with logs — jump from a specific trace to its structured log lines in CloudWatch.
  7. Third-party API tracking — the X-Ray SDK traces outgoing HTTP calls, so you see Stripe / Twilio latency in the same map.

Pricing Model

X-Ray charges per-trace, split across three operations:

  • Traces recorded — per million traces ingested.
  • Traces retrieved — per million traces retrieved via GetTraceSummaries, BatchGetTraces, or the console.
  • Traces scanned — per million traces scanned during trace analytics queries.

Free Tier: 100,000 traces recorded, 1,000,000 traces retrieved, and 1,000,000 traces scanned per month, indefinitely.

Data transfer is included. Sampling is your primary cost lever — default rules keep costs predictable even for high-traffic services. The X-Ray daemon and SDK are free.

Pros and Cons

Pros

  • Native AWS integration — Lambda, API Gateway, App Runner, Step Functions trace out of the box.
  • Service map is an instant big-picture view of a microservice architecture.
  • Deep AWS SDK instrumentation — every DynamoDB / S3 / SQS / HTTP call shows as a subsegment.
  • OpenTelemetry compatibility via ADOT avoids lock-in.
  • Generous free tier covers many small-to-medium apps.

Cons

  • Tracing must be enabled per service; an un-instrumented service becomes a "black node" on the map.
  • 30-day retention is shorter than most third-party APMs.
  • Annotation cap (50 per trace) limits rich filtering compared to high-cardinality trace systems.
  • Console UX is functional but less polished than Datadog / Honeycomb / Jaeger.
  • No full APM features (browser RUM, profiling, logs unification) — those live in separate CloudWatch products.

Comparison with Alternatives

| | X-Ray | CloudWatch Application Signals | OpenTelemetry + Jaeger | Datadog APM | | --- | --- | --- | --- | --- | | Scope | Distributed tracing on AWS | SLO-focused APM built on X-Ray | OSS tracing, self-hosted | Full SaaS APM (traces + RUM + metrics) | | AWS integration | Native | Native | Via OTEL exporter | Via agent | | Sampling | Rules | Inherited from X-Ray | Per collector config | Agent-side | | Retention | 30 days | 30 days (metrics longer) | Configurable (self-hosted) | Configurable (paid) | | Cost | Per trace | X-Ray + Signals | Infra only | Per host / span |

X-Ray vs OpenTelemetry: OpenTelemetry is the open standard; X-Ray is AWS's backend that can receive OTEL data through ADOT. For new apps, instrument with OpenTelemetry and export to X-Ray (and/or other backends) — this avoids lock-in while keeping native AWS integration.

Exam Relevance

  • Developer Associate (DVA-C02) — heavy coverage: enabling X-Ray on Lambda (active tracing flag), API Gateway, ECS daemon sidecar, annotations vs metadata, sampling rules.
  • Solutions Architect Associate (SAA-C03) — X-Ray as the AWS answer for distributed tracing across microservices.
  • DevOps Professional (DOP-C02) — X-Ray in CI/CD pipelines (canary analysis), correlation with CloudWatch Logs, cross-account trace aggregation.
  • SysOps Administrator (SOA-C02) — X-Ray daemon deployment, sampling rule operations, integration with Application Signals for SLO monitoring.

Exam trap: X-Ray vs CloudTrail. CloudTrail logs AWS API calls (the control plane: "who created this S3 bucket?"). X-Ray traces application requests (the data plane: "where did this checkout spend 2 seconds?"). They are complementary.

Frequently Asked Questions

Q: What's the difference between X-Ray and CloudWatch Logs?

A: CloudWatch Logs store text or structured log lines emitted by applications — one entry per log call. X-Ray captures the shape of a request as it flows through a distributed system: which services it touched, in what order, with what latency and errors. Logs answer "what did this service say?"; X-Ray answers "how did this request travel?" They are most powerful together: enable the "send trace ID to logs" integration, and from any X-Ray trace you can jump directly to the CloudWatch Logs lines written during that specific request, and vice versa. CloudWatch Application Signals builds a unified SLO view on top of both.

Q: How does sampling work, and how do I control costs?

A: X-Ray samples at the SDK / daemon level based on sampling rules — the default rule records the first request per second plus a fixed percentage (5%) of additional requests per service. You can define custom rules in the X-Ray console or via IaC: match by ServiceName, HTTPMethod, URLPath, and Host, then specify a reservoir (N per second, always sampled) and a fixed rate (percentage of remaining requests). Record high-value endpoints (checkout, payments) at 100%, sample high-volume health checks at near zero. Sampling decisions propagate via the trace header, so downstream services honor the parent decision — costs scale with the sampled, not total, request rate.

Q: Do I need to use the X-Ray SDK, or can I use OpenTelemetry?

A: Both are supported. The X-Ray SDK is AWS-specific, auto-instruments AWS SDK calls, HTTP clients, and popular ORMs, and is the quickest path for pure AWS deployments. OpenTelemetry — via the AWS Distro for OpenTelemetry (ADOT) collector — is the recommended path for new applications because it's vendor-neutral: you can export the same traces to X-Ray, Datadog, Honeycomb, or a self-hosted Jaeger/Tempo simultaneously. ADOT preserves the AWS propagation header plus W3C Trace Context, so AWS-native services remain fully integrated. For brownfield migrations, start with the X-Ray SDK, then move to ADOT when you need multi-backend flexibility.


This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official AWS X-Ray documentation before making production decisions.

Published: 4/17/2026

This article is for informational purposes only. AWS services, pricing, and features change frequently — always verify details against the official AWS documentation before making production decisions.

More in Monitoring