Ephemeral vs Persistent Memory

In Module 2.2, we discussed the basics of State. Now, we go deeper into Chronic Memory. Most agents today have "Amnesia"—once a session ends, they forget everything. To build a truly personal agent, it must have a Long-Term Memory (LTM) that persists across days, months, and thousands of distinct threads.

In this lesson, we will explore how to architect a "Personal Cache" for your agents.

1. Defining the Three Tapers of Memory

To be effective, an agent must manage information at three different "speeds":

Short-Term (Conversation): The current chat window. (Deleted when the thread is closed).
Medium-Term (Working State): Variables like is_logged_in or current_project_id. (Held in the checkpointer).
Long-Term (The Identity): "The user prefers Python," "The user's daughter is named Sarah," "The user works at Google." (Stored in a Vector or Graph DB).

2. The Implementation: Memory-as-a-Tool

We don't send the "Entire History" of a user to the LLM. That would be million of tokens. Instead, we give the agent two tools:

save_to_memory(fact: str): "The user just told me they are vegan. I should remember this."
retrieve_memory(query: str): "Does the user have any food allergies?"

The Flow

The agent calls save_to_memory.
The backend embeds the text and saves it to a User-Specific Vector Index.
Future sessions query this index before every response.

3. Fact vs. Context vs. Preference

Not everything should be "Remembered" forever.

Facts (Static): "The user lives in NYC." -> Save to LTM.
Context (Transient): "The user is currently at the airport." -> Save to Short-Term Memory.
Preferences (Operational): "The user likes short, direct answers." -> Save to System Prompt / Profile.

4. The "Summary Memory" Strategy

As we saw in Module 3.3, we can summarize history. But what do we do with the summary? Production Pattern: Every time a thread finishes, a "Reflector Agent" reads the whole chat and extracts Entity Updates.

Update: "Increment 'Coffee_Count' for User_123."
Update: "Change 'Project_Status' from 'Active' to 'Done'."

This transforms raw text into Actionable Data for the next session.

5. Privacy: The "State" of the Mirror

Long-term memory is the highest privacy risk in AI. If your database is hacked, the hacker doesn't just get a password; they get a Digital Mirror of the user's life.

Security Rules for LTM

Encryption at Rest: The vector database must be encrypted with a user-specific key.
Explicit Consent: The agent should say: "Should I remember that for next time?" before calling the save tool.

6. Implementation Strategy: Memory in LangGraph

We use a Post-Processing Node that runs after the conversation is over.

async def memory_reflector_node(state):
    # Only run this at the END of a session
    summary = await llm.ainvoke(f"Extract key personal facts from this chat: {state['messages']}")
    await vector_db.add(summary, metadata={"user_id": state["user_id"]})
    return state

Summary and Mental Model

Think of Long-Term Memory like The Notes a Doctor Takes.

They don't remember every single word you said in the check-up (Short-term).
But they look at your chart (Long-term) to see your history before you walk in the door.

An agent without a chart is just a stranger.

Exercise: Memory Design

Categorization: Which memory (Short, Medium, or Long) would you use for these 3 items:
- User's Birthday.
- The fact that the user is currently "In a hurry."
- The partial code written in the last message.
Refinement: Write a prompt for the "Memory Reflector" that prevents it from saving PII like Credit Card numbers into the long-term vector store.
UX: Describe how you would show a user "What the agent knows about them."
- (Hint: A "Personal Profile" page where they can edit or delete memories). Ready for complex relationships? Next lesson: Graph Databases for Complex Relationships.

The Agent's Diary: Persistent Memory