Amazon CloudWatch: What It Is and When to Use It
Definition
Amazon CloudWatch is AWS's unified monitoring and observability service. It collects metrics, logs, traces (via AWS X-Ray), and events from AWS services, your applications, and on-premises workloads — and exposes them through alarms, dashboards, and query tools. CloudWatch is on by default for most AWS services: EC2, RDS, Lambda, ECS, EKS, S3, and hundreds more automatically publish metrics and (for compute services) logs.
How It Works
CloudWatch is split into several related products that share a common data plane:
- CloudWatch Metrics — numeric time-series data with dimensions (e.g.,
CPUUtilizationbyInstanceId). Standard resolution is 1-minute; high-resolution metrics go down to 1-second. - CloudWatch Alarms — fire actions when a metric crosses a threshold for N datapoints. Actions include SNS notifications, EC2 Auto Scaling, Systems Manager Ops, and Lambda.
- CloudWatch Logs — log groups (one per source) contain log streams (one per log producer). Logs are retained for a configurable period, encrypted, and searchable with Logs Insights.
- CloudWatch Logs Insights — a purpose-built query language for fast log analysis across log groups.
- CloudWatch Dashboards — customizable widgets that visualize metrics and log queries.
- CloudWatch Events / Amazon EventBridge — originally "CloudWatch Events," this evolved into the standalone EventBridge service, which remains deeply linked with CloudWatch.
- CloudWatch Synthetics — scripted "canaries" that check endpoints and UI flows from AWS-hosted browsers.
- CloudWatch RUM (Real User Monitoring) — browser-side performance and error telemetry.
- Container Insights — detailed metrics, logs, and traces for ECS, EKS, Kubernetes.
- Lambda Insights — memory, CPU, and network metrics per Lambda function.
- Application Signals — APM-style service maps and SLO tracking (newer feature).
Key Features and Limits
- Default metrics — every AWS service publishes baseline metrics at 5-minute resolution (free), or 1-minute resolution on EC2 if "detailed monitoring" is enabled (paid).
- Custom metrics — put your application metrics into CloudWatch with the
PutMetricDataAPI. - High-resolution metrics — down to 1-second granularity; useful for spiky workloads and sub-minute alarms.
- Log retention — configurable from 1 day to forever (default is "never expire," which is a common cost leak).
- Log subscriptions — stream logs to Kinesis Data Streams / Firehose / Lambda for real-time processing.
- Metric filters — extract numeric metrics from log text.
- Composite alarms — combine multiple alarms with AND/OR/NOT to reduce noise.
- Anomaly detection alarms — ML-based dynamic thresholds instead of fixed values.
- CloudWatch Agent — unified agent for EC2 and on-premises servers to publish system-level metrics and logs.
- Embedded Metric Format (EMF) — structured logs that CloudWatch automatically turns into metrics, avoiding extra
PutMetricDatacalls. - Integrations — native integration with AWS Config, AWS Organizations, Security Hub, GuardDuty, AWS Backup, AWS Chatbot (for Slack/Teams), and X-Ray.
Common Use Cases
- Infrastructure monitoring — CPU, memory, disk, network across the fleet.
- Application performance monitoring — custom metrics + logs + traces tied to requests.
- Alerting — CloudWatch Alarms → SNS → email / SMS / PagerDuty.
- Auto scaling — CloudWatch metrics drive EC2 Auto Scaling Groups, ECS Service Auto Scaling, DynamoDB auto-scaling, Aurora Auto Scaling.
- Log analytics — Logs Insights queries across ALB, CloudFront, VPC Flow Logs, Lambda, application logs.
- Synthetic uptime and journey testing — Synthetics canaries verify endpoints and login flows.
- Container observability — Container Insights for ECS / EKS metrics and logs per container.
- Compliance — CloudWatch Logs is the typical destination for CloudTrail, VPC Flow Logs, and audit trails.
Pricing Model
CloudWatch charges per dimension of data. The most common bill lines:
- Metrics — per custom metric-month. First 10 metrics are free. Detailed monitoring on EC2 is paid.
- Alarms — per alarm-month. Composite alarms count once.
- Logs ingestion — per GB of logs ingested, with a separate (cheaper) rate for Logs Infrequent Access tier.
- Logs storage — per GB-month of stored log data.
- Logs Insights queries — per GB of data scanned.
- Dashboards — first 3 dashboards free; after that, a monthly fee per dashboard.
- Synthetics — per canary run.
- RUM — per event.
- Container Insights / Application Signals / Contributor Insights — separate per-resource or per-event charges.
The AWS Free Tier includes 10 metrics, 10 alarms, 5 GB of log ingestion, 3 dashboards, and 1,000,000 API requests per month.
Common cost leaks: never-expiring log retention, high-cardinality custom metrics, and running dashboards in every Region.
Pros and Cons
Pros
- Zero setup for the baseline: every AWS service publishes metrics automatically.
- Unified across compute, networking, storage, databases, containers.
- Rich alarm and composite-alarm capabilities.
- Logs Insights is fast and uses a simple purpose-built DSL.
- Direct integration with EventBridge, Lambda, SNS, and Auto Scaling.
Cons
- Log ingest and long retention can become the dominant AWS bill line if unconfigured.
- Per-metric cost scales with cardinality — expensive if you create metrics per user or per request.
- Dashboards are functional but less polished than Grafana / Datadog.
- Cross-Region dashboards and alarms require extra configuration.
- No full APM distributed tracing — that's X-Ray's job.
Comparison with Alternatives
| | CloudWatch | Managed Grafana / Prometheus | Datadog / New Relic | | --- | --- | --- | --- | | Source | Native on AWS services | Grafana + Prometheus + AMP | SaaS agents | | Logs | Yes (Logs) | Loki / CloudWatch | Yes | | Traces | X-Ray integration | Tempo | Yes (APM) | | Dashboards | Built-in | Grafana (very polished) | Native (very polished) | | Cost at scale | AWS bill line | Lower for high-cardinality, but operational overhead | Higher; most polished UX | | Best for | Default observability on AWS | Teams wanting open-source stack | Polished SaaS APM across clouds |
Exam Relevance
- Cloud Practitioner (CLF-C02) — know CloudWatch is the monitoring service and that it provides metrics, logs, and alarms.
- Solutions Architect Associate (SAA-C03) — metric-driven Auto Scaling, CloudWatch Alarms → SNS → email/SMS, Logs as audit trail, integration with Lambda.
- Developer Associate (DVA-C02) — custom metrics via
PutMetricData, CloudWatch Agent on EC2, Embedded Metric Format, Lambda Insights, structured JSON logging. - SysOps Administrator (SOA-C02) — heavy coverage: agent configuration, log retention, composite alarms, Synthetics canaries, Contributor Insights, CloudWatch cost optimization.
- DevOps Professional (DOP-C02) — SLO monitoring with Application Signals, automated remediation via EventBridge + Lambda, deployment canaries.
Frequently Asked Questions
Q: What is the difference between CloudWatch and CloudTrail?
A: CloudWatch monitors operational data — "what is happening / how is it performing?" — metrics, logs, alarms, traces. CloudTrail is an audit log — "who did what, when, from where?" — every AWS API call is recorded as a CloudTrail event. The two are complementary and often used together: CloudTrail writes its events to CloudWatch Logs, where metric filters and alarms then generate alerts on suspicious patterns like ConsoleLoginFailed spikes.
Q: How much does CloudWatch Logs cost, and how do I keep it in check?
A: Logs bill you three ways: ingestion (per GB ingested), storage (per GB-month), and Logs Insights query (per GB scanned). Common cost-optimization tactics: set explicit retention periods on every log group (30 / 90 / 365 days depending on compliance needs), turn down verbose log levels in production, use the Logs Infrequent Access tier for logs that rarely need to be queried, sample high-volume logs (e.g., keep 1% of successful requests), and export older logs to S3 (where storage is up to 20× cheaper).
Q: When should I use Metric Filters vs Embedded Metric Format (EMF)?
A: Metric Filters extract numeric values from plain-text log lines and create CloudWatch metrics — useful when logs are already being ingested and you don't control their format (e.g., ALB access logs). Embedded Metric Format (EMF) is a structured JSON format you emit from your application that CloudWatch automatically parses into metrics without separate PutMetricData calls — it's the modern, cheaper path because it avoids the per-metric API charge and keeps metrics and logs aligned. For new applications, prefer EMF.
This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official Amazon CloudWatch documentation before making production decisions.