Introduction: Lifting the Hood on Bedrock—Your AI Engine Room

Before you can race a high-performance car, you need to know what’s under the hood and how the controls work. The same is true for building with Amazon Bedrock—AWS’s engine room for generative AI. Bedrock transforms prompts and raw data into business value, but to use it effectively, you must first understand its core architecture and controls.

This chapter is your quick tour of Bedrock’s fundamentals. We’ll open the hood, explain how Bedrock simplifies access to a growing catalog of powerful AI models, and show you how it abstracts away the complexity of managing infrastructure. Think of Bedrock as a universal remote for foundation models—large, pre-trained AI models capable of tasks like summarization, Q&A, text generation, and even multimodal understanding.

The Bedrock model catalog is expanding rapidly, now featuring state-of-the-art models such as Anthropic Claude 3.5 Haiku and 3.7 Sonnet, Meta Llama 4 (Scout 17B, Maverick 17B), Amazon Nova Sonic, and Writer Palmyra X5/X4. These models offer advanced capabilities, larger context windows, and multimodal support. Selecting the right model for your use case is a key architectural decision—one we’ll revisit throughout the book.

Why start here? Every advanced Bedrock pattern—like Retrieval-Augmented Generation (RAG), model tuning, or multi-step agents—relies on a clear understanding of Bedrock’s building blocks. For example, you’ll need to know what a prompt is (the input text you send to a model), what a foundation model is, and the difference between runtime and agent modes (these determine how you interact with models and orchestrate tasks; we’ll define both soon).

A major recent development is prompt caching, now generally available for the latest Claude 3.x, Amazon Nova, and Llama 4 models. Prompt caching allows Bedrock to store and reuse responses for repeated prompts, dramatically reducing both latency and cost in high-throughput or repetitive workloads. Even in simple API examples, understanding and leveraging prompt caching is a modern best practice—covered in detail in Chapter 5, but introduced here as a foundational concept.

Throughout this chapter, you’ll see practical examples and analogies. For instance, setting IAM permissions for Bedrock is like giving your pit crew the right tools—essential for safe, secure operations. You’ll also get hands-on with the Bedrock API to see how a single prompt can generate a summarized report in seconds.

Basic Bedrock API Call with Latest Model (Python/Boto3)

import boto3
import json
# Initialize the Bedrock runtime clientbedrock = boto3.client('bedrock-runtime')
# Prepare your prompt (the input sent to the model)payload = {
    "prompt": "Summarize the following text: ..."}
# Call a foundation model (e.g., Anthropic Claude 3.5 Haiku)response = bedrock.invoke_model(
    modelId='anthropic.claude-3-5-haiku',  # Use the latest model for best performance and prompt caching support    contentType='application/json',
    accept='application/json',
    body=json.dumps(payload)
)
# Read and print the model's outputprint(response['body'].read())

This code shows how simple it is to use Bedrock: a few lines of Python, and you’re tapping into advanced AI—no need to manage servers, GPUs, or model deployments. Each line is commented for clarity. (Note: This is a synchronous, stateless API call. Error handling is omitted for brevity. For production, consider leveraging prompt caching for repeated prompts to reduce cost and latency—see Chapter 5 for details.)

Key concepts to watch for as you read:

Foundation model: A large, pre-trained AI model Bedrock makes available via API. Newer models offer enhanced capabilities and efficiency.
Prompt: The input you send to a model to get a response.
Prompt caching: A feature that stores and reuses model outputs for identical prompts, reducing cost and response time (available for select models as of 2025).
Runtime mode: Directly invoking a model for tasks like summarization or Q&A.
Agent mode: Orchestrating multi-step workflows or tool-calling with Bedrock agents.

We’ll explore each in detail.

By the end of this chapter, you’ll have the foundation needed to build, deploy, and scale generative AI on AWS. Mastering these basics—including prompt caching and model selection—will make advanced topics like RAG, model customization, and agent orchestration much easier (see Chapters 3, 5, 6, and 9 for deep dives).

Ready to get started? Next, we’ll break down Bedrock’s architecture and service model, giving you the blueprint to confidently build with generative AI.

Bedrock Architecture and Service Model

Amazon Bedrock is your all-in-one gateway to the world’s leading foundation models—no server management, no scaling headaches, and no model-specific API puzzles. Think of it as a universal power strip for AI: plug in any supported model (such as Claude, Titan, Mistral, or Llama), and power up through a single, consistent interface. As of April 2025, Bedrock supports the newest models, including Llama 4 Scout 17B and Maverick 17B (with multimodal capabilities), and Writer Palmyra X5/X4, offering enhanced context windows and image understanding. As your needs evolve, you can swap models without reworking your code, while always benefiting from AWS security and scalability.

Bedrock is fully managed—AWS handles the hardware, scaling, updates, and security behind the scenes. You interact with Bedrock through a unified API, available via the AWS Console, CLI, or SDKs such as Boto3 for Python. This API abstracts away model differences, much like a universal TV remote lets you control many devices with one set of buttons.

Bedrock supports two primary operational modes, each optimized for different workflows:

Runtime Mode: Direct, single-step model calls. Send a prompt, get a response. Use this for tasks like text generation, summarization, embeddings, or image understanding—when your request is simple and atomic.
Agent Mode: Multi-step, tool-using workflows. Define an AI “agent” that can plan, call APIs or databases, and combine results. Use agent mode for more complex tasks, such as document extraction with validation and follow-up actions.