Module 5 Lesson 1: Streaming Responses
The Typing Effect. How to use converse_stream to send tokens to your UI as they are generated.
converse_stream: Real-Time AI
Waiting 10 seconds for a full 500-word response feels "Slow." Receiving the first word in 200ms and seeing the rest "Type out" feels "Instant." This is the power of Streaming.
1. How it Works
Instead of a single JSON response, Bedrock sends a Stream of Events. Your code must iterate over this stream to capture the pieces.
2. Python Implementation
import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")
messages = [{"role": "user", "content": [{"text": "Write a 3-paragraph story."}]}]
# Notice the '_stream' suffix
response = client.converse_stream(
modelId="anthropic.claude-3-haiku-20240307-v1:0",
messages=messages
)
# The response is an iterable 'stream'
for event in response["stream"]:
if "contentBlockDelta" in event:
token = event["contentBlockDelta"]["delta"]["text"]
print(token, end="", flush=True)
3. Visualizing the Handoff
graph LR
User[Query] --> B[Bedrock Call]
B --> Event1[Token: 'Once']
B --> Event2[Token: ' upon']
B --> Event3[Token: ' a']
B --> Event4[Token: ' time']
Event1 --> Screen[User UI]
Event2 --> Screen
Event3 --> Screen
Event4 --> Screen
4. Why Streaming is Mandatory for UX
- TTFT (Time to First Token): This is the metric users actually feel. Streaming reduces TTFT from seconds to milliseconds.
- Engagement: Watching the AI "Think" live prevents users from refreshing the page or getting frustrated.
Summary
converse_streamprovides token-by-token output.- You must iterate over the stream events.
contentBlockDeltais the event type containing the actual text.- Streaming significantly improves the perceived speed of your application.