Module 5 Lesson 1: Streaming Responses
·AWS Bedrock

Module 5 Lesson 1: Streaming Responses

The Typing Effect. How to use converse_stream to send tokens to your UI as they are generated.

converse_stream: Real-Time AI

Waiting 10 seconds for a full 500-word response feels "Slow." Receiving the first word in 200ms and seeing the rest "Type out" feels "Instant." This is the power of Streaming.

1. How it Works

Instead of a single JSON response, Bedrock sends a Stream of Events. Your code must iterate over this stream to capture the pieces.

2. Python Implementation

import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

messages = [{"role": "user", "content": [{"text": "Write a 3-paragraph story."}]}]

# Notice the '_stream' suffix
response = client.converse_stream(
    modelId="anthropic.claude-3-haiku-20240307-v1:0",
    messages=messages
)

# The response is an iterable 'stream'
for event in response["stream"]:
    if "contentBlockDelta" in event:
        token = event["contentBlockDelta"]["delta"]["text"]
        print(token, end="", flush=True)

3. Visualizing the Handoff

graph LR
    User[Query] --> B[Bedrock Call]
    B --> Event1[Token: 'Once']
    B --> Event2[Token: ' upon']
    B --> Event3[Token: ' a']
    B --> Event4[Token: ' time']
    
    Event1 --> Screen[User UI]
    Event2 --> Screen
    Event3 --> Screen
    Event4 --> Screen

4. Why Streaming is Mandatory for UX

  • TTFT (Time to First Token): This is the metric users actually feel. Streaming reduces TTFT from seconds to milliseconds.
  • Engagement: Watching the AI "Think" live prevents users from refreshing the page or getting frustrated.

Summary

  • converse_stream provides token-by-token output.
  • You must iterate over the stream events.
  • contentBlockDelta is the event type containing the actual text.
  • Streaming significantly improves the perceived speed of your application.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn