Module 12: Cost and Token Optimization

Lesson 1: Understanding Claude Pricing Models

As a Certified Architect, you don't just care about "Smart" code; you care about Profitable code. AI is expensive. Every time Claude thinks, you are being billed. To scale an agent to 1 million users, you must understand the mathematical components of the Claude Billing Model.

In this lesson, we deconstruct how Anthropic charges for tokens and the "Price-Performance" tiering of the Claude 3.5 family.

1. The Token Billing Unit

Billing is divided into three distinct categories:

Input Tokens: Everything you send to Claude (Prompt, History, Tools).
Output Tokens: Everything Claude sends back (Reasoning, Answer, Tool Calls).
Cached Tokens (The Architecture Win): Tokens that Claude has seen before in a short timeframe. These are billed at a 90% discount.

Rule of Consumption:

Output tokens are significantly more expensive (usually 5x to 15x) than input tokens. This is why Terse model outputs are more profitable.

2. The Claude 3.5 Tiering

Model	Speed	Intelligence	Best For...
Opus	Slow	Extreme	Research, Heavy Logic, One-shot complex tasks.
Sonnet	Fast	High	Coding, Orchestration, Complex Tool use.
Haiku	Ultra-Fast	Moderate	Routing, Summarization, High-volume classification.

Architect's Strategy: Never use Opus if Sonnet can do the job. Never use Sonnet if Haiku can do the job.

3. The "Token-Per-Turn" Calculation

To estimate the cost of a multi-turn agent: Total Cost = (Input_Base + (Input_Turn * Turns)) + (Output_Base * Turns)

As the conversation gets longer, the Input Tokens (carrying the history) become the dominant cost. This is why Context Pruning (Module 9, Lesson 4) is a financial necessity, not just a technical one.

4. Visualizing the Cost Mix

pie
    title Average Cost Distribution (Multi-turn)
    "System Prompt (Input)": 15
    "Conversation History (Input)": 45
    "Tools & Results (Input)": 20
    "Actual Answer (Output)": 20

5. Summary

Input is cheap; Output is expensive.
Cache is your best friend (90% savings).
Haiku-Sonnet-Opus is a hierarchy: choose the "Minimum Viable Model."

In the next lesson, we look at how to shrink these costs: Token-Efficient Prompt Design.

Interactive Quiz

Why are input tokens cheaper than output tokens?
What is the financial benefit of "Prompt Caching"?
When would you choose Claude Haiku over Claude Sonnet for an architectural task?
Scenario: A user sends 10 messages in a row. How does the total cost change if you don't prune the conversation history?

Reference Video: