Every successful AI solution starts small—a chatbot that finally understands a tricky question, or a pipeline that pulls insights from documents in seconds. At first, these run on a developer’s laptop or a team’s AWS sandbox. Feedback is instant. Fixes are easy.
But as soon as your generative-AI prototype shows promise, expectations shift. Business users want uptime. Compliance teams want guardrails. Suddenly, your API faces hundreds or thousands of requests. You need to ensure not just accuracy, but also cost control, privacy, and auditability.
This is the leap from prototype to production. Here, architecture and MLOps (Machine Learning Operations) are not optional—they’re your flight plan and your ground crew.
Let’s use a simple analogy:
In production, ad-hoc scripts and manual fixes won’t cut it. You need strong architecture and disciplined operations. Let’s break down what that means for Bedrock-powered solutions:
Architecture is your airport’s blueprint: runways, terminals, control towers. In software, this means designing for:
MLOps is your airport’s operations team. It keeps models healthy, deployments on time, and incidents managed. MLOps means:
If you’re new to MLOps, see Chapter 11 for a deep dive.
Modern Bedrock production workloads also benefit from prompt-level optimizations:
Observability is critical for reliability and compliance:
Let’s compare a quick prototype with a production-ready deployment. Notice how the production version builds in configuration, structured logging, error handling, and trace context—essentials for real-world reliability and observability.
# Prototype: quick and dirtyimport boto3
bedrock = boto3.client('bedrock-runtime')
response = bedrock.invoke_model(
modelId='anthropic.claude-v2',
body={'prompt': 'Summarize this contract.'}
)
print(response['output'])
# Production: robust, secure, observable, and optimizedimport boto3
import logging
import os
import json
import uuid
from aws_xray_sdk.core import patch_all, xray_recorder
patch_all() # Instrument all supported libraries for tracing# Configure logging for structured (JSON) outputlogger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter(json.dumps({
"timestamp": "%(asctime)s",
"level": "%(levelname)s",
"message": "%(message)s",
"trace_id": "%(trace_id)s"}))
handler.setFormatter(formatter)
logger.handlers = [handler]
def handler(event, context):
bedrock = boto3.client('bedrock-runtime')
trace_id = str(uuid.uuid4())
try:
prompt = event['prompt'] # Get input from event, not hard-coded model_id = os.environ['BEDROCK_MODEL_ID']
# Optionally: set up prompt caching context if available with xray_recorder.in_segment('BedrockInvoke', traceid=trace_id):
response = bedrock.invoke_model(
modelId=model_id,
body={
'prompt': prompt,
# Enable prompt caching if supported 'cacheConfig': {'enabled': True}
}
)
# Structured log for auditing and debugging logger.info(json.dumps({
"event": "bedrock_invoke",
"request": prompt,
"response": response['output'],
"model_id": model_id,
"trace_id": trace_id
}))
return {"statusCode": 200, "body": response['output']}
except Exception as e:
logger.error(json.dumps({
"event": "error",
"error": str(e),
"trace_id": trace_id
}))
return {"statusCode": 500, "body": "Internal server error."}
Key improvements in the production code:
These are the first steps toward systems that work reliably, at scale, and under pressure.
In summary:
Throughout this chapter, you’ll learn how to: