Lesson 4: Balancing Performance and Cost
·System Architecture

Lesson 4: Balancing Performance and Cost

Master the efficiency frontier. Learn how to design 'Model-Switching' architectures that use cheap models for simple tasks and premium models for complex reasoning to optimize your overall burn rate.


Module 12: Cost and Token Optimization

Lesson 4: Balancing Performance and Cost

A naive architect uses Claude Sonnet for everything. An Advanced Architect uses the right tool for the right job. If you use a high-intelligence model to perform a simple task (like "Is this message a question?"), you are "Burning Money."

In this lesson, we master the Model Switching pattern and learn how to optimize the cost of your entire agentic chain.


1. The "Cheap Router" Pattern

Instead of sending every user request to Sonnet, use Haiku as a gatekeeper.

  1. The Router (Haiku): "Is this request complex (requires code) or simple (requires greeting)?"
  2. Path A (Simple): Haiku answers immediately. (Cost: $0.0001)
  3. Path B (Complex): The request is forwarded to Sonnet. (Cost: $0.003)

By "Routing" away the simple queries, you can reduce your total LLM bill by 30-70%.


2. Summarization vs. Raw History

We discussed summarization for "Context Control" (Module 9), but it is also a Cost Control.

  • Case 1 (Raw): 50 turns of history. Every turn you pay for all 50 turns. (Geometric cost growth).
  • Case 2 (Summarized): 1 turn of summary. Every turn you pay for 1 turn. (Linear cost growth).

Architect's Strategy: Use Haiku to summarize the history every 5 turns.


3. The "Pareto Front" of Models

  • Use Haiku for: Classification, Sentiment, Formatting, Summarization.
  • Use Sonnet for: Coding, Planning, Multi-step reasoning.
  • Use Opus for: Final audits of mission-critical tasks where failure = $1M loss.

4. Visualizing Model Orchestration

graph LR
    U[User Message] --> R[Router: Haiku]
    R -->|GREETING| H[Haiku Generates Response]
    R -->|CODING| S[Sonnet Generates Code]
    S --> A[Auditor: Opus]
    A -->|FAIL| S
    A -->|PASS| U

5. Summary

  • Route simple tasks to cheaper models.
  • Compress history to prevent "Token Creep."
  • Audit expensive model outputs with cheaper ones where possible.

In the final lesson of this module, we look at how to watch your wallet: Monitoring and Managing Spend.


Interactive Quiz

  1. What is the "Cheap Router" pattern?
  2. Why is summarization considered a financial optimization?
  3. In what scenario would you still choose an expensive model for a simple task? (Hint: Latency).
  4. Scenario: You have a task that takes 10 turns. How does the cost differ between Raw History and Summarized History?

Reference Video:

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn