Module 10: Reliability and Guardrails

Lesson 1: Identifying Common Failure Modes

To build a reliable system, you must first understand how it breaks. Unlike traditional software (where code is deterministic), AI systems fail in Probabilistic ways. You can't just "Fix the bug"; you have to Defuse the probability of the failure occurring.

In this lesson, we identify the four "Failure Archetypes" of agentic AI.

1. Type A: Tool-Choice Hallucination

The model calls a tool that doesn't exist, or it invents a parameter that isn't in the schema.

Cause: Vague tool descriptions or overly complex JSON schemas.
Defense: Strict JSON Schema (Module 8) and simplified naming conventions (Module 4).

2. Type B: The Reasoning Loop

The model identifies a task, tries to solve it, fails, and then tries the exact same solution again and again.

Cause: High temperature (randomness) or lack of "Self-Reflection" instructions.
Defense: Sequence Detection and "Reflection" steps (Module 2).

3. Type C: Goal Drift

The agent starts a task (e.g., "Fix a bug") but gets distracted by a secondary observation (e.g., "Hey, this variable name is and could be better") and ends up refactoring the whole repo while forgetting the original bug.

Cause: Long context windows (SNR drop) and weak instruction anchoring.
Defense: Scoped Rules (Module 6) and "CURRENT_STATUS" anchors (Module 9).

4. Type D: Output Format Violation

The model produces a beautiful answer but wraps it in text like "Sure, here is the JSON you asked for:" which breaks your parser.

Cause: Conversational "Prose" defaults of the model.
Defense: "Output MUST be JSON" guardrails and parsing retry loops (Module 8).

5. Visualizing the Failure Hierarchy

graph TD
    A[System Failure] --> B[Tooling Error]
    A --> C[Logical Error]
    A --> D[Format Error]
    B --> B1[Parameter Hallucination]
    C --> C1[Infinite Loop]
    C --> C2[Goal Drift]
    D --> D1[JSON Parse Error]

6. Summary

A reliable architect is a "Failure Detective." By recognizing these four archetypes, you can design specific Guardrails (Module 10, Lesson 3) to prevent them before they manifest in production.

In the next lesson, we look at the simplest way to handle these failures: Retry Logic and Backoff Strategies.

Interactive Quiz

What is "Tool-Choice Hallucination"?
Why is "Goal Drift" particularly dangerous in autonomous agents?
What is the difference between a "Format Error" and a "Logical Error"?
Scenario: Your agent is supposed to extract email addresses. 5% of the time, it returns "No email found" even when one exists. Which Failure archetype is this, and how would you start investigating?

Reference Video: