Before you can race a high-performance car, you need to know what’s under the hood and how the controls work. The same is true for building with Amazon Bedrock—AWS’s engine room for generative AI. Bedrock transforms prompts and raw data into business value, but to use it effectively, you must first understand its core architecture and controls.
This chapter is your quick tour of Bedrock’s fundamentals. We’ll open the hood, explain how Bedrock simplifies access to a growing catalog of powerful AI models, and show you how it abstracts away the complexity of managing infrastructure. Think of Bedrock as a universal remote for foundation models—large, pre-trained AI models capable of tasks like summarization, Q&A, text generation, and even multimodal understanding.
The Bedrock model catalog is expanding rapidly, now featuring state-of-the-art models such as Anthropic Claude 3.5 Haiku and 3.7 Sonnet, Meta Llama 4 (Scout 17B, Maverick 17B), Amazon Nova Sonic, and Writer Palmyra X5/X4. These models offer advanced capabilities, larger context windows, and multimodal support. Selecting the right model for your use case is a key architectural decision—one we’ll revisit throughout the book.
Why start here? Every advanced Bedrock pattern—like Retrieval-Augmented Generation (RAG), model tuning, or multi-step agents—relies on a clear understanding of Bedrock’s building blocks. For example, you’ll need to know what a prompt is (the input text you send to a model), what a foundation model is, and the difference between runtime and agent modes (these determine how you interact with models and orchestrate tasks; we’ll define both soon).
A major recent development is prompt caching, now generally available for the latest Claude 3.x, Amazon Nova, and Llama 4 models. Prompt caching allows Bedrock to store and reuse responses for repeated prompts, dramatically reducing both latency and cost in high-throughput or repetitive workloads. Even in simple API examples, understanding and leveraging prompt caching is a modern best practice—covered in detail in Chapter 5, but introduced here as a foundational concept.
Throughout this chapter, you’ll see practical examples and analogies. For instance, setting IAM permissions for Bedrock is like giving your pit crew the right tools—essential for safe, secure operations. You’ll also get hands-on with the Bedrock API to see how a single prompt can generate a summarized report in seconds.
import boto3
import json
# Initialize the Bedrock runtime clientbedrock = boto3.client('bedrock-runtime')
# Prepare your prompt (the input sent to the model)payload = {
"prompt": "Summarize the following text: ..."}
# Call a foundation model (e.g., Anthropic Claude 3.5 Haiku)response = bedrock.invoke_model(
modelId='anthropic.claude-3-5-haiku', # Use the latest model for best performance and prompt caching support contentType='application/json',
accept='application/json',
body=json.dumps(payload)
)
# Read and print the model's outputprint(response['body'].read())
This code shows how simple it is to use Bedrock: a few lines of Python, and you’re tapping into advanced AI—no need to manage servers, GPUs, or model deployments. Each line is commented for clarity. (Note: This is a synchronous, stateless API call. Error handling is omitted for brevity. For production, consider leveraging prompt caching for repeated prompts to reduce cost and latency—see Chapter 5 for details.)
Key concepts to watch for as you read:
We’ll explore each in detail.
By the end of this chapter, you’ll have the foundation needed to build, deploy, and scale generative AI on AWS. Mastering these basics—including prompt caching and model selection—will make advanced topics like RAG, model customization, and agent orchestration much easier (see Chapters 3, 5, 6, and 9 for deep dives).
Ready to get started? Next, we’ll break down Bedrock’s architecture and service model, giving you the blueprint to confidently build with generative AI.
Amazon Bedrock is your all-in-one gateway to the world’s leading foundation models—no server management, no scaling headaches, and no model-specific API puzzles. Think of it as a universal power strip for AI: plug in any supported model (such as Claude, Titan, Mistral, or Llama), and power up through a single, consistent interface. As of April 2025, Bedrock supports the newest models, including Llama 4 Scout 17B and Maverick 17B (with multimodal capabilities), and Writer Palmyra X5/X4, offering enhanced context windows and image understanding. As your needs evolve, you can swap models without reworking your code, while always benefiting from AWS security and scalability.
Bedrock is fully managed—AWS handles the hardware, scaling, updates, and security behind the scenes. You interact with Bedrock through a unified API, available via the AWS Console, CLI, or SDKs such as Boto3 for Python. This API abstracts away model differences, much like a universal TV remote lets you control many devices with one set of buttons.
Bedrock supports two primary operational modes, each optimized for different workflows: