
Lesson 1: Understanding Claude Pricing Models
Master the economics of tokens. Learn the difference between input, output, and cached tokens across the Claude 3.5 family, and how to build a cost model for your enterprise agent.
Module 12: Cost and Token Optimization
Lesson 1: Understanding Claude Pricing Models
As a Certified Architect, you don't just care about "Smart" code; you care about Profitable code. AI is expensive. Every time Claude thinks, you are being billed. To scale an agent to 1 million users, you must understand the mathematical components of the Claude Billing Model.
In this lesson, we deconstruct how Anthropic charges for tokens and the "Price-Performance" tiering of the Claude 3.5 family.
1. The Token Billing Unit
Billing is divided into three distinct categories:
- Input Tokens: Everything you send to Claude (Prompt, History, Tools).
- Output Tokens: Everything Claude sends back (Reasoning, Answer, Tool Calls).
- Cached Tokens (The Architecture Win): Tokens that Claude has seen before in a short timeframe. These are billed at a 90% discount.
Rule of Consumption:
Output tokens are significantly more expensive (usually 5x to 15x) than input tokens. This is why Terse model outputs are more profitable.
2. The Claude 3.5 Tiering
| Model | Speed | Intelligence | Best For... |
|---|---|---|---|
| Opus | Slow | Extreme | Research, Heavy Logic, One-shot complex tasks. |
| Sonnet | Fast | High | Coding, Orchestration, Complex Tool use. |
| Haiku | Ultra-Fast | Moderate | Routing, Summarization, High-volume classification. |
Architect's Strategy: Never use Opus if Sonnet can do the job. Never use Sonnet if Haiku can do the job.
3. The "Token-Per-Turn" Calculation
To estimate the cost of a multi-turn agent:
Total Cost = (Input_Base + (Input_Turn * Turns)) + (Output_Base * Turns)
As the conversation gets longer, the Input Tokens (carrying the history) become the dominant cost. This is why Context Pruning (Module 9, Lesson 4) is a financial necessity, not just a technical one.
4. Visualizing the Cost Mix
pie
title Average Cost Distribution (Multi-turn)
"System Prompt (Input)": 15
"Conversation History (Input)": 45
"Tools & Results (Input)": 20
"Actual Answer (Output)": 20
5. Summary
- Input is cheap; Output is expensive.
- Cache is your best friend (90% savings).
- Haiku-Sonnet-Opus is a hierarchy: choose the "Minimum Viable Model."
In the next lesson, we look at how to shrink these costs: Token-Efficient Prompt Design.
Interactive Quiz
- Why are input tokens cheaper than output tokens?
- What is the financial benefit of "Prompt Caching"?
- When would you choose Claude Haiku over Claude Sonnet for an architectural task?
- Scenario: A user sends 10 messages in a row. How does the total cost change if you don't prune the conversation history?
Reference Video: