Amazon CloudWatch: What It Is and When to Use It

Definition

Amazon CloudWatch is AWS's unified monitoring and observability service. It collects metrics, logs, traces (via AWS X-Ray), and events from AWS services, your applications, and on-premises workloads — and exposes them through alarms, dashboards, and query tools. CloudWatch is on by default for most AWS services: EC2, RDS, Lambda, ECS, EKS, S3, and hundreds more automatically publish metrics and (for compute services) logs.

How It Works

CloudWatch is split into several related products that share a common data plane:

  • CloudWatch Metrics — numeric time-series data with dimensions (e.g., CPUUtilization by InstanceId). Standard resolution is 1-minute; high-resolution metrics go down to 1-second.
  • CloudWatch Alarms — fire actions when a metric crosses a threshold for N datapoints. Actions include SNS notifications, EC2 Auto Scaling, Systems Manager Ops, and Lambda.
  • CloudWatch Logs — log groups (one per source) contain log streams (one per log producer). Logs are retained for a configurable period, encrypted, and searchable with Logs Insights.
  • CloudWatch Logs Insights — a purpose-built query language for fast log analysis across log groups.
  • CloudWatch Dashboards — customizable widgets that visualize metrics and log queries.
  • CloudWatch Events / Amazon EventBridge — originally "CloudWatch Events," this evolved into the standalone EventBridge service, which remains deeply linked with CloudWatch.
  • CloudWatch Synthetics — scripted "canaries" that check endpoints and UI flows from AWS-hosted browsers.
  • CloudWatch RUM (Real User Monitoring) — browser-side performance and error telemetry.
  • Container Insights — detailed metrics, logs, and traces for ECS, EKS, Kubernetes.
  • Lambda Insights — memory, CPU, and network metrics per Lambda function.
  • Application Signals — APM-style service maps and SLO tracking (newer feature).

Key Features and Limits

  • Default metrics — every AWS service publishes baseline metrics at 5-minute resolution (free), or 1-minute resolution on EC2 if "detailed monitoring" is enabled (paid).
  • Custom metrics — put your application metrics into CloudWatch with the PutMetricData API.
  • High-resolution metrics — down to 1-second granularity; useful for spiky workloads and sub-minute alarms.
  • Log retention — configurable from 1 day to forever (default is "never expire," which is a common cost leak).
  • Log subscriptions — stream logs to Kinesis Data Streams / Firehose / Lambda for real-time processing.
  • Metric filters — extract numeric metrics from log text.
  • Composite alarms — combine multiple alarms with AND/OR/NOT to reduce noise.
  • Anomaly detection alarms — ML-based dynamic thresholds instead of fixed values.
  • CloudWatch Agent — unified agent for EC2 and on-premises servers to publish system-level metrics and logs.
  • Embedded Metric Format (EMF) — structured logs that CloudWatch automatically turns into metrics, avoiding extra PutMetricData calls.
  • Integrations — native integration with AWS Config, AWS Organizations, Security Hub, GuardDuty, AWS Backup, AWS Chatbot (for Slack/Teams), and X-Ray.

Common Use Cases

  1. Infrastructure monitoring — CPU, memory, disk, network across the fleet.
  2. Application performance monitoring — custom metrics + logs + traces tied to requests.
  3. Alerting — CloudWatch Alarms → SNS → email / SMS / PagerDuty.
  4. Auto scaling — CloudWatch metrics drive EC2 Auto Scaling Groups, ECS Service Auto Scaling, DynamoDB auto-scaling, Aurora Auto Scaling.
  5. Log analytics — Logs Insights queries across ALB, CloudFront, VPC Flow Logs, Lambda, application logs.
  6. Synthetic uptime and journey testing — Synthetics canaries verify endpoints and login flows.
  7. Container observability — Container Insights for ECS / EKS metrics and logs per container.
  8. Compliance — CloudWatch Logs is the typical destination for CloudTrail, VPC Flow Logs, and audit trails.

Pricing Model

CloudWatch charges per dimension of data. The most common bill lines:

  • Metrics — per custom metric-month. First 10 metrics are free. Detailed monitoring on EC2 is paid.
  • Alarms — per alarm-month. Composite alarms count once.
  • Logs ingestion — per GB of logs ingested, with a separate (cheaper) rate for Logs Infrequent Access tier.
  • Logs storage — per GB-month of stored log data.
  • Logs Insights queries — per GB of data scanned.
  • Dashboards — first 3 dashboards free; after that, a monthly fee per dashboard.
  • Synthetics — per canary run.
  • RUM — per event.
  • Container Insights / Application Signals / Contributor Insights — separate per-resource or per-event charges.

The AWS Free Tier includes 10 metrics, 10 alarms, 5 GB of log ingestion, 3 dashboards, and 1,000,000 API requests per month.

Common cost leaks: never-expiring log retention, high-cardinality custom metrics, and running dashboards in every Region.

Pros and Cons

Pros

  • Zero setup for the baseline: every AWS service publishes metrics automatically.
  • Unified across compute, networking, storage, databases, containers.
  • Rich alarm and composite-alarm capabilities.
  • Logs Insights is fast and uses a simple purpose-built DSL.
  • Direct integration with EventBridge, Lambda, SNS, and Auto Scaling.

Cons

  • Log ingest and long retention can become the dominant AWS bill line if unconfigured.
  • Per-metric cost scales with cardinality — expensive if you create metrics per user or per request.
  • Dashboards are functional but less polished than Grafana / Datadog.
  • Cross-Region dashboards and alarms require extra configuration.
  • No full APM distributed tracing — that's X-Ray's job.

Comparison with Alternatives

| | CloudWatch | Managed Grafana / Prometheus | Datadog / New Relic | | --- | --- | --- | --- | | Source | Native on AWS services | Grafana + Prometheus + AMP | SaaS agents | | Logs | Yes (Logs) | Loki / CloudWatch | Yes | | Traces | X-Ray integration | Tempo | Yes (APM) | | Dashboards | Built-in | Grafana (very polished) | Native (very polished) | | Cost at scale | AWS bill line | Lower for high-cardinality, but operational overhead | Higher; most polished UX | | Best for | Default observability on AWS | Teams wanting open-source stack | Polished SaaS APM across clouds |

Exam Relevance

  • Cloud Practitioner (CLF-C02) — know CloudWatch is the monitoring service and that it provides metrics, logs, and alarms.
  • Solutions Architect Associate (SAA-C03) — metric-driven Auto Scaling, CloudWatch Alarms → SNS → email/SMS, Logs as audit trail, integration with Lambda.
  • Developer Associate (DVA-C02) — custom metrics via PutMetricData, CloudWatch Agent on EC2, Embedded Metric Format, Lambda Insights, structured JSON logging.
  • SysOps Administrator (SOA-C02) — heavy coverage: agent configuration, log retention, composite alarms, Synthetics canaries, Contributor Insights, CloudWatch cost optimization.
  • DevOps Professional (DOP-C02) — SLO monitoring with Application Signals, automated remediation via EventBridge + Lambda, deployment canaries.

Frequently Asked Questions

Q: What is the difference between CloudWatch and CloudTrail?

A: CloudWatch monitors operational data — "what is happening / how is it performing?" — metrics, logs, alarms, traces. CloudTrail is an audit log — "who did what, when, from where?" — every AWS API call is recorded as a CloudTrail event. The two are complementary and often used together: CloudTrail writes its events to CloudWatch Logs, where metric filters and alarms then generate alerts on suspicious patterns like ConsoleLoginFailed spikes.

Q: How much does CloudWatch Logs cost, and how do I keep it in check?

A: Logs bill you three ways: ingestion (per GB ingested), storage (per GB-month), and Logs Insights query (per GB scanned). Common cost-optimization tactics: set explicit retention periods on every log group (30 / 90 / 365 days depending on compliance needs), turn down verbose log levels in production, use the Logs Infrequent Access tier for logs that rarely need to be queried, sample high-volume logs (e.g., keep 1% of successful requests), and export older logs to S3 (where storage is up to 20× cheaper).

Q: When should I use Metric Filters vs Embedded Metric Format (EMF)?

A: Metric Filters extract numeric values from plain-text log lines and create CloudWatch metrics — useful when logs are already being ingested and you don't control their format (e.g., ALB access logs). Embedded Metric Format (EMF) is a structured JSON format you emit from your application that CloudWatch automatically parses into metrics without separate PutMetricData calls — it's the modern, cheaper path because it avoids the per-metric API charge and keeps metrics and logs aligned. For new applications, prefer EMF.


This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official Amazon CloudWatch documentation before making production decisions.

Published: 4/16/2026

This article is for informational purposes only. AWS services, pricing, and features change frequently — always verify details against the official AWS documentation before making production decisions.

More in Monitoring