Amazon Bedrock: What It Is and When to Use It
Definition
Amazon Bedrock is a fully managed service that exposes leading foundation models (FMs) from Anthropic, Amazon, Meta, Mistral, Cohere, AI21 Labs, and Stability AI behind a single, serverless, AWS-native API. Rather than standing up GPU infrastructure or signing up with each model provider separately, developers call InvokeModel or Converse with a model ID and Bedrock handles inference, scaling, authentication (IAM + SigV4), observability (CloudWatch), and private networking (VPC endpoints). Bedrock is AWS's primary generative-AI platform and the default choice when an application needs to call an LLM on AWS.
How It Works
Bedrock is built around three layers:
- Model access layer — per-Region catalog of FMs. You explicitly enable each model family for your account. Each model has a stable
modelId(for exampleanthropic.claude-opus-4-5-20250101-v1:0) and optional cross-Region inference profiles that route traffic across multiple AWS Regions for higher throughput. - Inference APIs —
InvokeModel(streaming or non-streaming, raw provider-specific payload) andConverse(a unified chat/tool-use API that works identically across Claude, Nova, Llama, Mistral, and Cohere Command). The Converse API handles multi-turn conversation, system prompts, tool use, and JSON output in a provider-agnostic way. - Application primitives — higher-level features layered on top of the model APIs: Knowledge Bases (managed RAG), Agents (tool-using planners), Guardrails (safety filters), Prompt Management, Model Evaluation, and model customization (fine-tuning + continued pre-training).
Models available include Anthropic's Claude family (Haiku, Sonnet, Opus), Amazon Titan (text, embeddings, image), Amazon Nova (Micro, Lite, Pro, Premier — AWS's own FM family), Meta Llama, Mistral (7B, Large, Mixtral), Cohere (Command R/R+, Embed, Rerank), AI21 (Jamba), and Stability AI (image generation). Availability varies by Region — US East (N. Virginia) and US West (Oregon) carry the widest catalog.
Key Features and Limits
- Converse API — single unified interface for chat-style interactions, tool use, streaming, and structured output across all compatible models.
- Knowledge Bases for RAG — managed retrieval-augmented generation; ingests from S3, SharePoint, Salesforce, Confluence, or web crawlers, chunks, embeds (Titan or Cohere Embed), and stores in OpenSearch Serverless, Aurora PostgreSQL + pgvector, Pinecone, Redis Enterprise, or MongoDB Atlas.
- Agents for Bedrock — multi-step planners that decompose a user goal into tool calls (Lambda, Knowledge Bases, API schemas) and aggregate results.
- Guardrails — content filters (hate, violence, sexual, insults), denied topics, PII redaction, contextual grounding checks, and word filters. Can be attached to any model call.
- Prompt Management and Prompt Flows — versioned prompts and visual orchestration for multi-step chains.
- Model customization — continued pre-training and fine-tuning for supported models (for example Nova, Titan, Claude Haiku). Output is a private model hosted via Provisioned Throughput.
- Provisioned Throughput — reserved model units for predictable high-volume workloads, billed hourly (with optional 1 or 6 month commitments).
- Cross-Region inference profiles — automatically distribute requests across Regions to increase TPM/RPM quotas for the same model.
- VPC endpoints (PrivateLink) — keep traffic off the internet; data stays in the customer's Region and is not used to train models.
Limits
- Requests-per-minute (RPM) and tokens-per-minute (TPM) quotas are per account, per Region, per model. Default quotas are modest; cross-Region inference or Provisioned Throughput raises them.
- Maximum prompt and output tokens depend on the model — Claude Sonnet and Opus support 200K context; Nova Pro up to 300K; Llama 3.1 up to 128K.
Common Use Cases
- Chatbots and assistants — support bots, internal copilots, customer-facing agents.
- RAG over private data — Knowledge Bases to ground answers in company documents, wikis, or product catalogs.
- Agents and tool use — multi-step workflows that call internal APIs, look up data, and draft actions for human approval.
- Content generation and summarization — marketing copy, meeting notes, legal summaries, code explanations.
- Classification and extraction — invoice parsing, ticket routing, entity extraction with structured JSON output.
- Embeddings for search — Titan or Cohere Embed feeding OpenSearch, Aurora pgvector, or Kendra.
- Image generation — Titan Image, Nova Canvas, Stability AI for creative and marketing assets.
Pricing Model
Bedrock is billed per-token or per-image, with no infrastructure charge:
- On-Demand — per 1,000 input tokens and per 1,000 output tokens; rates vary by model (Claude Opus is premium; Haiku and Nova Micro are cheap). No commitment, no minimum.
- Batch inference — roughly 50% off on-demand rates for jobs that run asynchronously.
- Provisioned Throughput — per hour per model unit, with optional 1 or 6 month terms for deeper discounts. Used for steady, high-volume workloads or custom fine-tuned models.
- Knowledge Bases — you pay for the embedding model invocations, the vector store (for example OpenSearch Serverless OCUs), and the retrieval model invocations during each query.
- Agents — pay for the underlying model calls plus any Lambda invocations used as tools.
- Guardrails — per text unit processed (1,000 characters).
- Model customization — per training token plus storage, then Provisioned Throughput for hosting.
Data transfer in is free; data transfer out follows standard AWS rates.
Pros and Cons
Pros
- Multiple top-tier model families behind one API — easy to A/B test or switch.
- Native AWS integration: IAM, VPC endpoints, CloudWatch, CloudTrail, KMS.
- Knowledge Bases, Agents, and Guardrails remove a lot of RAG/agent boilerplate.
- Data is not used to train foundation models and stays in-Region.
- Token-based pricing with no idle cost.
Cons
- Model availability and quotas vary by Region — can require cross-Region profiles.
- Cutting-edge models sometimes land on provider APIs (Anthropic, OpenAI) before Bedrock.
- Provisioned Throughput for custom models is costly — think enterprise workloads.
- Knowledge Bases ingestion is managed but opinionated; custom RAG may still use pgvector or OpenSearch directly.
- No OpenAI or Google models — those run on their own clouds or Azure/Vertex AI.
Comparison with Alternatives
| | Amazon Bedrock | OpenAI API | Google Vertex AI | Azure OpenAI | | --- | --- | --- | --- | --- | | Model families | Claude, Titan, Nova, Llama, Mistral, Cohere, AI21, Stability | GPT-4/5, o-series | Gemini, Claude (partner), Llama | GPT-4/5, o-series | | Hosting | AWS Regions, in-VPC | OpenAI cloud | GCP Regions | Azure Regions | | Pricing | Per token, Provisioned Throughput | Per token | Per token, provisioned | Per token, PTU | | RAG primitive | Knowledge Bases | Assistants / File Search | Vertex AI Search | Azure AI Search | | Best for | AWS-native teams, multi-model | Latest OpenAI models | GCP-native teams, Gemini | Azure-native teams, enterprise OpenAI |
Rule of thumb: if your workload runs on AWS and you want Claude or Nova without managing infrastructure, use Bedrock. If you specifically need GPT-4/5, use OpenAI or Azure OpenAI; for Gemini, Vertex AI.
Exam Relevance
- AI Practitioner (AIF-C01) — heavy coverage: Bedrock vs SageMaker, foundation model basics, RAG, Agents, Guardrails, prompt engineering.
- Machine Learning Engineer Associate (MLA-C01) — Bedrock deployment patterns, Knowledge Bases, model evaluation, combining Bedrock with SageMaker pipelines.
- Solutions Architect Associate (SAA-C03) — Bedrock behind API Gateway + Lambda as a serverless GenAI backend; VPC endpoints for private inference.
Classic exam trap: Bedrock vs SageMaker. Bedrock = call a foundation model via API. SageMaker = train or host a custom model. If the scenario says "no infrastructure," "pay per token," or "multiple FM providers," answer is Bedrock.
Frequently Asked Questions
Q: What's the difference between Bedrock and SageMaker?
A: Bedrock gives you a serverless API over pre-built foundation models (Claude, Titan, Nova, Llama, etc.) billed per token — you never manage GPUs. SageMaker is a platform for training, tuning, and hosting your own custom models on managed EC2 instances, billed per instance-second. If you're calling an LLM for summarization, chat, or RAG, start with Bedrock. If you're training a custom tabular or vision model, or hosting an open-source LLM that Bedrock doesn't carry, use SageMaker.
Q: How do Knowledge Bases for Bedrock work?
A: You point a Knowledge Base at a data source (S3, SharePoint, Salesforce, Confluence, or web), choose a chunking strategy, pick an embedding model (Titan Embed or Cohere Embed), and choose a vector store (OpenSearch Serverless, Aurora pgvector, Pinecone, Redis, MongoDB Atlas). Bedrock ingests, chunks, embeds, and indexes your documents. At query time, RetrieveAndGenerate embeds the user's question, finds top-k chunks, and asks the chosen generation model to answer using those chunks — a managed RAG pipeline in one API call.
Q: Is my data used to train Bedrock foundation models?
A: No. AWS's terms state that prompts, completions, fine-tuning data, and embeddings you send to Bedrock are not used to train any AWS or third-party foundation models. Data stays in the Region you invoke the model in, is encrypted in transit and at rest, and can be kept inside your VPC via PrivateLink endpoints. This is a significant difference from some consumer LLM APIs and is a major reason enterprises choose Bedrock.
This article reflects AWS features and pricing as of 2026. AWS services evolve rapidly — always verify against the official Amazon Bedrock documentation before making production decisions.