
Lesson 5: Iterative Improvement Cycles
Master the rhythm of the AI architect. Learn how to turn evaluation data into architectural action, following a strict 'Run-Analyze-Pivot' cycle to reach production-grade reliability.
Module 11: Evaluating Agent Performance
Lesson 5: Iterative Improvement based on Performance Data
Building an AI system is not a "Once and Done" task. It is a Cycle. In this final lesson of Module 11, we look at the "Improvement Loop" that takes a prototype from 60% accuracy to 99% accuracy.
This cycle is the primary day-to-day workflow of a Certified Architect.
1. Step 1: The "Failure Audit"
Don't fix what isn't broken.
- Run your Eval Suite.
- Filter for the Failed Cases.
- Read the
<thinking>logs for those specific cases to see where the model's logic went off the rails.
2. Step 2: The "Surgical" Prompt Edit
Many developers "Over-Correct." If the model fails one test, they rewrite the entire prompt.
- The Move: Make the smallest possible change to the instruction that address the specific failure found in Step 1.
- Example: If the model missed a constraint, don't rewrite the whole role; just emphasize the constraint in the "Guardrails" section (Module 7).
3. Step 3: Regression Testing (The Verify Phase)
- Run the Eval Suite again.
- Check the Pass Rate: Did it go up?
- CRITICAL: Check the previously passing cases. Did your edit break something that used to work? (This is a "Regression").
If Accuracy went up and no Regressions occurred, you have successfully "Iterated."
4. Visualizing the Improvement Loop
graph TD
A[Baseline Eval] --> B[Failure Audit]
B --> C[Hypothesis: 'I need a tighter Schema']
C --> D[Surgical Edit]
D --> E[Regression Test]
E -->|Success| F[Next Goal]
E -->|Failure/Regress| G[Pivot Hypothesis]
G --> D
5. Summary of Module 11
Module 11 has mastered the "Science" of AI.
- You used Benchmarks to set the floor (Lesson 1).
- You used Scoring to balance trade-offs (Lesson 2).
- You built an Eval Suite for automation (Lesson 3).
- You used Analysis to find root causes (Lesson 4).
- You adopted the Iterative Cycle for refinement (Lesson 5).
In Module 12, we look at the money: Cost and Token Optimization.
Interactive Quiz
- Why should you only make "Surgical" (small) edits to prompts?
- What is a "Regression" in an AI evaluation?
- Why is it important to read the
<thinking>log during the failure audit? - Scenario: You are at 92% accuracy. Your last 3 iterations have not increased the score. What is this called, and what architectural "Pivot" might you try? (e.g., Change model? Decompose task?)
Reference Video: