Lesson 1: Understanding Claude Pricing Models
·Economics of AI

Lesson 1: Understanding Claude Pricing Models

Master the economics of tokens. Learn the difference between input, output, and cached tokens across the Claude 3.5 family, and how to build a cost model for your enterprise agent.


Module 12: Cost and Token Optimization

Lesson 1: Understanding Claude Pricing Models

As a Certified Architect, you don't just care about "Smart" code; you care about Profitable code. AI is expensive. Every time Claude thinks, you are being billed. To scale an agent to 1 million users, you must understand the mathematical components of the Claude Billing Model.

In this lesson, we deconstruct how Anthropic charges for tokens and the "Price-Performance" tiering of the Claude 3.5 family.


1. The Token Billing Unit

Billing is divided into three distinct categories:

  • Input Tokens: Everything you send to Claude (Prompt, History, Tools).
  • Output Tokens: Everything Claude sends back (Reasoning, Answer, Tool Calls).
  • Cached Tokens (The Architecture Win): Tokens that Claude has seen before in a short timeframe. These are billed at a 90% discount.

Rule of Consumption:

Output tokens are significantly more expensive (usually 5x to 15x) than input tokens. This is why Terse model outputs are more profitable.


2. The Claude 3.5 Tiering

ModelSpeedIntelligenceBest For...
OpusSlowExtremeResearch, Heavy Logic, One-shot complex tasks.
SonnetFastHighCoding, Orchestration, Complex Tool use.
HaikuUltra-FastModerateRouting, Summarization, High-volume classification.

Architect's Strategy: Never use Opus if Sonnet can do the job. Never use Sonnet if Haiku can do the job.


3. The "Token-Per-Turn" Calculation

To estimate the cost of a multi-turn agent: Total Cost = (Input_Base + (Input_Turn * Turns)) + (Output_Base * Turns)

As the conversation gets longer, the Input Tokens (carrying the history) become the dominant cost. This is why Context Pruning (Module 9, Lesson 4) is a financial necessity, not just a technical one.


4. Visualizing the Cost Mix

pie
    title Average Cost Distribution (Multi-turn)
    "System Prompt (Input)": 15
    "Conversation History (Input)": 45
    "Tools & Results (Input)": 20
    "Actual Answer (Output)": 20

5. Summary

  • Input is cheap; Output is expensive.
  • Cache is your best friend (90% savings).
  • Haiku-Sonnet-Opus is a hierarchy: choose the "Minimum Viable Model."

In the next lesson, we look at how to shrink these costs: Token-Efficient Prompt Design.


Interactive Quiz

  1. Why are input tokens cheaper than output tokens?
  2. What is the financial benefit of "Prompt Caching"?
  3. When would you choose Claude Haiku over Claude Sonnet for an architectural task?
  4. Scenario: A user sends 10 messages in a row. How does the total cost change if you don't prune the conversation history?

Reference Video:

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn