Module 9: Context Management

Lesson 4: Context Pruning and Compression

In a 50-turn conversation, the details of "Turn 2" are usually irrelevant to "Turn 50." If you keep Turn 2 in the prompt, you are paying for data that doesn't add value. To solve this, architects use automated pruning and semantic compression.

In this lesson, we look at the three most common strategies for keeping a conversation "Lean."

1. The "Rolling Window" Strategy

This is the simplest method. You only keep the last N turns of history in the prompt.

Example: Only the last 10 turns.
Problem: If the user references something from turn 5 (e.g., "Use that API key I gave you earlier"), the model will have "Forgotten" it.

2. The "Summarization" Strategy (Semantic Compression)

Instead of deleting old turns, you Summarize them.

You ask a separate, cheaper model (like Claude Haiku) to: "Summarize the core facts and decisions from the first 20 turns of this chat."
You then delete the 20 raw turns and replace them with a single "Conversation Summary" block in the prompt.

Result: You go from 20,000 tokens of history to 200 tokens, but the model "Remembers" the core decisions.

3. The "Selective Preservation" Strategy

You tag specific messages as "Anchor Facts."

If a user provides an API key or an architectural requirement, mark it as Priority: High.
During pruning, never delete High-priority items. Delete everything else (greetings, formatting questions, etc.).

4. Visualizing Compression

graph TD
    History[50 Raw Turns - 100k Tokens] --> C{Compression Engine}
    C -->|Rolling| R[Last 5 Turns - 10k Tokens]
    C -->|Summarize| S[1 Summary Turn - 1k Tokens]
    C -->|Selective| P[Summary + Anchor Facts - 2k Tokens]
    P --> Model[Focused Claude Response]

5. Summary

Pruning saves money and increases focus.
Summarization is the best balance between memory and cost.
Anchor Facts prevent the model from "Forgetting" mission-critical data.

In the final lesson of this module, we look at how to apply these in practice: Multi-turn Conversation Handling.

Interactive Quiz

What is a "Rolling Window" and what is its main weakness?
How does "Semantic Compression" (Summarization) work?
Why use Claude Haiku for summarization instead of Claude Opus?
Scenario: A user gives you a database schema in Turn 1. You are now at Turn 40. How would you "Selectively Preserve" that schema during context pruning?

Reference Video: