Agent memory: Letta vs Mem0 vs Zep vs Cognee

jamesdanielwhitford · October 29, 2025, 9:05am

If you’re building AI agents, you’ve probably run into a fundamental limitation: LLMs forget everything the moment you close the conversation. This article explains how agent memory works and compares five popular solutions to the problem: Mem0, Zep, Cognee, Letta Platform, and AI Memory SDK.

Agent memory

When you call an LLM like GPT-4 or Claude, it only knows what’s in the current prompt. There’s no persistent memory of past conversations, user preferences, or learned behaviors. This stateless design works fine for one-off questions, but breaks down when building agents that need to maintain context across multiple interactions.

Agent memory solves this by adding persistent storage that agents can read from and write to. It transforms stateless LLMs into stateful agents that can remember user preferences, learn from past interactions, and provide personalized experiences.

The simplest form of memory is keeping the conversation history in the context window. But context windows have limits, making this solution expensive and slow. The diagram below shows the difference between stateless and stateful approaches. Stateless agents require manual prompt engineering to manage an unstructured context window, while stateful agents use structured memory blocks. A memory system like Letta automatically compiles context from persistent state storage into these organized blocks.

Agent memory typically involves two components:

Working memory is the information currently loaded in the agent’s context window, which functions similarly to your short-term memory during a conversation. It holds the immediate context needed to respond to the current request.
Long-term storage consists of external databases (vector stores, knowledge graphs, or traditional databases) that store information beyond what fits in the context. Agents retrieve relevant information from long-term storage and load it into working memory as needed.

The challenge is deciding what information to store, how to organize it, and when to retrieve it. Different memory solutions approach these decisions in different ways.

How agent memory works

Before diving into specific tools, we’ll look at the different architectural approaches to agent memory. Each solution in this article takes one (or a combination) of these approaches.

Drop-in memory layers

This approach treats memory as an external service that wraps around your existing stateless agent. You continue using your LLM as before, but prior to each request, the memory layer retrieves relevant context and injects it into the prompt. After each response, it extracts and stores important information.

Think of drop-in memory layers like adding a note-taking assistant that sits between you and your agent. The agent itself stays stateless, but the memory layer manages what to remember and when to surface it.

Tools like Mem0 use this approach. They provide APIs to store and retrieve memories, but your agent architecture stays largely the same.

Temporal knowledge graphs

Facts change. A user might move from San Francisco to New York, then back to San Francisco. Their job title might change. Their preferences might evolve. Most memory systems either overwrite old information or accumulate contradictions.

Temporal knowledge graphs solve the problem of lost or contradictory information by tracking when facts are valid. Instead of storing “User lives in San Francisco,” they store “User lived in San Francisco (valid: Jan 2023–May 2024),” and “User lives in New York (valid: May 2024–present).”

A temporal knowledge graph maps entities (users, places, and concepts) as nodes and relationships as edges with time metadata. When you query a graph, you can ask for current facts or historical context.

Zep uses this approach with its Graphiti engine, which builds temporal knowledge graphs from conversations and business data.

Semantic knowledge graphs

While temporal knowledge graphs focus on when facts are valid, semantic knowledge graphs focus on what facts mean and how they relate to one another. This approach uses ontologies (structured representations of domain knowledge) to understand the relationships between concepts.

Instead of just storing chunks of text in a vector database, semantic knowledge graphs build structured representations of how the text chunks relate to one another. When you mention “Python,” the system understands it relates to “programming languages,” “Guido van Rossum,” and “machine learning libraries.” These relationships enable more intelligent retrieval.

This approach combines vector search for semantic similarity with graph traversal for relationship-based queries. You can ask, “What programming languages has the user worked with?” and get answers that understand the conceptual relationships.

Cognee builds these semantic knowledge graphs on the fly during document processing and retrieval.

Stateful programming

The previous three approaches treat memory as a feature you add to stateless agents. But there’s a fundamentally different approach: designing agents to be stateful from the ground up.

In stateful programming, memory isn’t an add-on. The agent’s entire architecture revolves around maintaining and modifying state over time. Instead of retrieving memories before each request, the agent continuously operates with persistent state.

This is similar to the difference between functional programming (stateless) and object-oriented programming (stateful). In functional programming, functions don’t maintain state between calls. In object-oriented programming, objects maintain internal state that persists and evolves.

Stateful agents have memory blocks that they can read and write to directly. These memory blocks are part of the agent’s core architecture, not external storage accessed through APIs. The agent can decide what to remember, what to forget, and how to organize its own memory.

The Letta Platform uses this approach. Agents exist as persistent entities with self-modifying memory, not as stateless functions called on demand.

For developers who want memory capabilities without adopting the full stateful paradigm, pluggable memory solutions like Letta’s AI Memory SDK provide a middle ground. These are lightweight libraries that add memory to existing applications without requiring an architectural shift.

Mem0: Universal memory layer for LLM applications

Mem0 (pronounced “mem-zero”) is a drop-in memory layer that adds persistent memory to AI applications. It sits between your LLM and your application logic, automatically extracting and storing important information from conversations.

How Mem0 works

Mem0 uses a two-phase approach to manage memory.

Extraction phase: When you add messages to Mem0, it analyzes the conversation to identify important facts, preferences, and context. It uses an LLM (GPT-4 by default) to determine what’s worth remembering. This isn’t just storing raw messages. Mem0 distills conversations into structured memories like “User prefers vegetarian food” or “User works as a backend developer in Python.”
Update phase: After extraction, Mem0 intelligently updates its memory store using a hybrid architecture that combines vector databases (for semantic search), graph databases (for relationships), and key-value stores (for fast lookups). If new information contradicts existing memories, it updates them. If information is redundant, it consolidates. This prevents memory from growing unbounded with duplicate or outdated information. According to research by the Mem0 team, this approach achieves over 90% reduction in token costs compared to full-context methods, along with 26% better accuracy and 91% lower latency.

Multi-level memory: User, session, and agent state

Mem0 organizes memories into three levels:

User memory stores long-term facts about individual users that persist across all sessions. This includes preferences, biographical information, and historical context. User memories are identified by a user_id.
Session memory stores temporary context for a specific conversation or task. Session memories are useful for maintaining context within a single interaction but can be discarded afterward. These use a session_id.
Agent memory stores information about the agent itself, its capabilities, learned behaviors, or instructions. This helps agents maintain consistent personas and improve over time. These use an agent_id.

Implementing Mem0 in your application

Install Mem0:

pip install mem0ai

Complete the basic setup and memory storage:

from mem0 import Memory

# Initialize memory
m = Memory()

# Add a conversation to memory
messages = [
    {"role": "user", "content": "Hi, I'm Alex. I'm a vegetarian."},
    {"role": "assistant", "content": "Hello Alex! I'll remember that you're vegetarian."},
    {"role": "user", "content": "I love Italian food."},
    {"role": "assistant", "content": "Great! I'll keep that in mind."}
]

# Store with user ID
m.add(messages, user_id="alex")

Retrieve memories:

# Get all memories for a user
all_memories = m.get_all(user_id="alex")
print(all_memories)
# Output: [
#   {"memory": "User is named Alex"},
#   {"memory": "User is vegetarian"},
#   {"memory": "User loves Italian food"}
# ]

# Search for specific information
results = m.search(
    query="What are Alex's food preferences?",
    user_id="alex"
)
print(results)
# Returns relevant memories ranked by relevance

Configure Mem0 with a vector store:

from mem0 import Memory

config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    }
}

m = Memory.from_config(config)

Mem0 works with OpenAI by default, but supports other LLM providers. It integrates with popular frameworks like LangChain, LangGraph, and CrewAI, making it easy to add memory to existing agent implementations.

Zep: Temporal knowledge graphs for agent memory

Zep is a context engineering platform built around temporal knowledge graphs. Unlike systems that simply store facts, Zep tracks how information changes over time and maintains historical context. This makes it particularly useful when facts evolve or contradict previous information.

How Zep works

At the core of Zep is Graphiti, an open-source library for building temporal knowledge graphs. When you add conversation data to Zep, Graphiti extracts entities (for example, people, places, and concepts) and relationships (for example, works at, lives in, and prefers) and stores them as a graph.

What makes this graph temporal is Zep’s bi-temporal model, which tracks two separate timelines:

Event time reflects when events actually happened in the real world.
Transaction time reflects when information was recorded in the system.

Each edge in the graph includes validity metadata. When a fact changes, Zep doesn’t overwrite the old information. Instead, it marks the old fact’s validity period as having ended and creates a new fact with its own validity period.

For example, if a user says, “I moved from Austin to Seattle,” Zep creates two location relationships with different time periods. When you ask, “Where does the user live?” you get Seattle. But you can also query historical information like, “Where did the user live in March 2024?” and get Austin.

Zep uses a hybrid search approach, combining vector similarity (for semantic matching) and graph traversal (for relationship-based queries). This lets you ask questions like, “What companies has the user worked for?” and get answers that understand the relationships between entities.

Tracking change over time

The temporal aspect becomes valuable in real-world scenarios where information changes as:

User preferences evolve (for example, stopped being vegetarian or changed career paths)
Business data updates (for example, company acquisitions or org changes)
Relationships change (for example, manager changes or location changes)

Traditional memory systems handle these changes either by overwriting old data (losing history) or by accumulating contradictory facts (creating confusion). Temporal graphs maintain both current truth and historical context.

Integrating Zep into your agent

Install Graphiti:

pip install graphiti-core

Set up environment variables for your graph database:

export OPENAI_API_KEY=your_openai_api_key
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=password

Initialize Graphiti and build indices:

import os
import asyncio
from dotenv import load_dotenv
from datetime import datetime, timezone

from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType

load_dotenv()

neo4j_uri = os.environ.get('NEO4J_URI', 'bolt://localhost:7687')
neo4j_user = os.environ.get('NEO4J_USER', 'neo4j')
neo4j_password = os.environ.get('NEO4J_PASSWORD', 'password')

# Initialize with Neo4j
graphiti = Graphiti(
    uri=neo4j_uri,
    user=neo4j_user,
    password=neo4j_password
)

async def main():
    # Build indices (one-time setup)
    await graphiti.build_indices_and_constraints()

Add conversation data as episodes:

    # First conversation
    await graphiti.add_episode(
        name="User location update",
        episode_body="The user mentioned they live in Austin and work at TechCorp",
        source=EpisodeType.message,
        source_description="Chat message from user onboarding",
        reference_time=datetime.now(timezone.utc),
    )

    # Later conversation with updated information
    await graphiti.add_episode(
        name="User relocation",
        episode_body="The user said they moved to Seattle and started at NewCompany",
        source=EpisodeType.message,
        source_description="Chat message from follow-up",
        reference_time=datetime.now(timezone.utc),
    )

Query with temporal awareness:

    # Get current facts
    results = await graphiti.search(
        query="Where does the user live and where do they work?",
        num_results=5
    )

    for result in results:
        print(f"Fact: {result.fact}")
        print(f"Valid from: {result.valid_from}")
        if result.valid_until:
            print(f"Valid until: {result.valid_until}")
        print("---")

    # Output includes both historical and current facts with validity periods

Zep also offers a cloud-hosted version if you don’t want to manage your own graph database infrastructure. The Graphiti library supports Neo4j, Amazon Neptune, FalkorDB, and Kuzu as database backends.

title: Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee (Part 2)
slug: guides/education/agent-memory-comparison-part-2

This is Part 2 of our agent memory comparison series. Part 1 covered the fundamentals of agent memory, Mem0’s drop-in memory layer, and Zep’s temporal knowledge graphs. In this part, we’ll explore Cognee’s semantic knowledge graphs and Letta’s stateful agent platform.

Cognee: Semantic memory with knowledge graphs

Cognee is an open-source library that builds semantic knowledge graphs from your data. It’s designed to provide sophisticated semantic understanding through knowledge graph construction.

How Cognee works

Traditional retrieval-augmented generation (RAG) systems embed documents into vectors and retrieve chunks based on similarity. This approach works, but it overlooks the relationships between concepts.

Cognee takes a different approach. When you add data, it builds a knowledge graph that captures entities and their relationships. The graph uses ontologies to understand how concepts relate to one another.

For example, if your documents mention “Python,” “machine learning,” and “scikit-learn,” Cognee doesn’t just store these keywords as separate chunks. It understands that Python is a programming language, machine learning is a field, and scikit-learn is a Python library for machine learning. These relationships enable smarter retrieval.

Building knowledge graphs

Cognee combines two storage layers:

A vector database for semantic similarity search, which supports Weaviate, Qdrant, LanceDB, and others.
A graph database for relationship modeling, which supports Neo4j and Kuzu.

When you query Cognee, it searches both the vector database for semantically similar content and the graph database for related concepts. This hybrid approach lets you ask questions like “What Python libraries are mentioned?” and get answers that understand Python as a programming language context.

The knowledge graph is built automatically during the “cognify” process, which analyzes your data, extracts entities and relationships, and structures them according to domain ontologies.

Adding Cognee to your agent

Install Cognee:

pip install cognee

Run the basic 5-line implementation:

import cognee
import asyncio

async def main():
    # Add data to Cognee
    await cognee.add("Natural language processing (NLP) is an interdisciplinary subfield of computer science.")

    # Process into knowledge graph
    await cognee.cognify()

    # Query with semantic understanding
    result = await cognee.search(query_text="Tell me about NLP")
    print(result)

asyncio.run(main())

Work with multiple documents:

import cognee
import asyncio

async def main():
    # Add related documents
    documents = [
        "GPT-4 is a large language model developed by OpenAI.",
        "OpenAI was founded in 2015 by Sam Altman, Elon Musk, and others.",
        "Sam Altman is the CEO of OpenAI and formerly led Y Combinator."
    ]

    await cognee.add(documents)
    await cognee.cognify()

    # Query understanding relationships
    result = await cognee.search(query_text="Who leads OpenAI?")
    print(result)
    # Returns answer understanding the relationships between
    # Sam Altman, CEO role, and OpenAI

asyncio.run(main())

Lastly, configure database backends using environment variables, typically stored in a .env file in your project’s root directory. The library automatically loads these variables on startup.

Here is an example .env file for connecting to Qdrant and Neo4j:

LLM_PROVIDER="openai"
LLM_MODEL="gpt-4"
LLM_API_KEY="your_openai_api_key"

VECTOR_DB_PROVIDER="qdrant"
# QDRANT_URL="http://localhost:6333" # Optional: for non-default host

GRAPH_DB_PROVIDER="neo4j"
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="password"

Cognee integrates with agent frameworks like LangChain and CrewAI, making it easy to add semantic memory to existing agent implementations. The knowledge graph construction is automatic, so you don’t need to manually define schemas or relationships.

Letta: Memory as infrastructure

Letta takes a fundamentally different approach to the other solutions in this article. While Mem0, Zep, and Cognee add memory capabilities to stateless agents, Letta is a general-purpose memory programming system for building stateful agents that have memory embedded into their core architecture.

How Letta works

The Letta Platform treats agents as persistent entities with continuous state, not as functions called on demand. When you create a Letta agent, it exists on the server and maintains state even when your application isn’t running. There are no sessions or conversation threads. Each agent has a single, perpetual existence.

This fundamentally changes how you think about agent development. Instead of fetching relevant memories before each LLM call, the agent continuously operates with its memory state. Memory isn’t something you manage externally. It’s part of the agent’s core architecture.

Letta was built by the researchers behind MemGPT, which pioneered the concept of treating LLM context management similar to how an operating system manages memory. The platform provides sophisticated memory management that lets agents learn how to use tools, evolve personalities, and improve their capabilities over time.

Core memory and archival memory

Letta implements a two-tier memory architecture:

The core memory lives inside the agent’s context window and is always available. This is the most important component for making your agent maintain consistent state across turns. Core memory is organized into memory blocks that the agent can read and write to directly, holding the agent’s most critical, frequently-accessed state.
The archival memory provides long-term retrieval for information outside the context window. When an agent needs information from archival memory, it retrieves the relevant pieces and loads them into core memory. Letta also supports file-based retrieval through the Letta Filesystem.

This hierarchy is similar to the way operating systems manage RAM and disk storage. Hot data stays in RAM (core memory), while cold data lives on disk (archival memory) and is paged in when needed.

Memory blocks: Self-modifying agent state

Memory blocks are the fundamental units of Letta’s memory system. Each block is a structured piece of information that the agent can modify. Unlike external memory systems, where you programmatically update memories through APIs, Letta agents can directly edit their own memory blocks.

For example, a Letta agent might have three memory blocks for storing different data.

Human Block: Information about the user (such as name, preferences, and history)
Persona Block: The agent’s identity and behavior guidelines
Custom Blocks: Domain-specific state (such as project context and code knowledge)

These blocks are editable by the agent itself, other agents, or developers through the API. This enables agents to learn and adapt their behavior based on experience.

Letta agents use three primary tools to edit their memory blocks:

memory_insert - Insert content into a block (useful for appending new information)
memory_replace - Search and replace specific text within a block (useful for updating existing information)
memory_rethink - Rewrite an entire block (useful for reorganizing or restructuring memory)

These tools give agents fine-grained control over how they manage their own memory, enabling them to decide what information is important enough to persist and how to organize it.

Memory blocks can also be shared between multiple agents. By attaching the same block to different agents, you can create powerful multi-agent systems where agents collaborate through shared memory. For example, multiple agents working on a project could share an “organization” or “project context” block, allowing them to maintain synchronized state and share learned information.

When agents build up extensive conversation histories, Letta can offload memory updates to background processes. A secondary agent handles memory consolidation separately from the main conversation flow, preventing memory operations from slowing down responses. The background processing frequency is configurable (defaulting to every 5 conversation steps), letting you balance how current the memory stays against processing overhead.

The current agent architecture in Letta leverages the model’s built-in reasoning capabilities and produces responses directly rather than routing them through intermediate tool invocations. This approach is compatible across different LLM providers. When switching between architectural patterns, you’ll export your agent configuration to the .af format and update the agent type setting.

Building with Letta

Install Letta:

# For local server development
# The sqlite-vec extension is loaded automatically by the local server but may need to be installed if not already installed
pip install letta

# For client SDK only (connecting to Letta Cloud or existing server)
pip install letta-client

Set up environment variables:

export OPENAI_API_KEY=your_openai_api_key

Start the local server:

letta server

The Letta platform includes a visual Agent Development Environment (ADE) that lets you:

Create and configure agents through a UI
View agent memory blocks in real-time
Monitor the agent’s decision-making process
Test and debug agent behavior

Create an agent with the Python SDK (on the local server):

from letta_client import Letta

# Connect to local Letta server
client = Letta(base_url="http://localhost:8283")

# Create agent with memory blocks
agent = client.agents.create(
    memory_blocks=[
        {
            "label": "human",
            "value": "Name: Sarah. Role: Senior Backend Engineer. Preferences: Prefers Python over JavaScript, likes concise explanations."
        },
        {
            "label": "persona",
            "value": "I am a helpful coding assistant. I adapt my explanations to the user's expertise level and communication style."
        }
    ],
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

print(f"Created agent: {agent.id}")

Chat with the agent:

# Send message to stateful agent
response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[
        {"role": "user", "content": "Can you help me optimize this SQL query?"}
    ]
)

# Agent responds with context from its memory blocks
for message in response.messages:
    if hasattr(message, 'reasoning'):
        print(f"Reasoning: {message.reasoning}")
    if hasattr(message, 'content'):
        print(f"Content: {message.content}")

Update memory blocks programmatically:

# Retrieve a specific memory block by label
human_block = client.agents.blocks.retrieve(
    agent_id=agent.id,
    block_label="human"
)

# Update the block with new information
client.blocks.modify(
    block_id=human_block.id,
    value="Name: Sarah. Role: Senior Backend Engineer. Preferences: Prefers Python over JavaScript, likes concise explanations. Currently working on: Microservices migration project."
)

Use Letta Cloud (the hosted platform):

from letta_client import Letta

# Connect to Letta Cloud
client = Letta(token="YOUR_LETTA_API_KEY")

# Create agent (same API, hosted infrastructure)
agent = client.agents.create(
    model="openai/gpt-4.1",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "human", "value": "The human's name is Alex."},
        {"label": "persona", "value": "I am a helpful assistant."}
    ]
)

Create shared memory blocks for multi-agent systems:

# Create a shared memory block
shared_block = client.blocks.create(
    label="project_context",
    description="Shared knowledge about the current project",
    value="Project: Building a chat application. Tech stack: Python, FastAPI, React. Timeline: 3 months.",
    limit=4000
)

# Create multiple agents that share this block
agent1 = client.agents.create(
    memory_blocks=[
        {"label": "persona", "value": "I am a backend specialist."}
    ],
    block_ids=[shared_block.id],  # Attach shared block
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

agent2 = client.agents.create(
    memory_blocks=[
        {"label": "persona", "value": "I am a frontend specialist."}
    ],
    block_ids=[shared_block.id],  # Both agents share this block
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

# Both agents can now access and modify the shared project context

Important note about block identifiers: Always reference blocks by their block_ids (UUIDs) when connecting them to agents. While labels provide readable names for your convenience, they don’t serve as unique identifiers—the same label can appear on multiple blocks throughout the system. For lookups scoped to a specific agent, combine block_label with the agent ID (as demonstrated in the example at line 503-506).

The key distinction is that Letta agents are stateful by design. The memory isn’t managed separately from the agent. It’s integral to how the agent operates. This enables more sophisticated agent behaviors, like self-improvement, personality evolution, and complex multi-turn interactions where the agent truly learns from experience.

Letta is also fully interoperable with other memory solutions. Through custom Python tools or Model Context Protocol (MCP) servers, you can use Mem0, Zep, Cognee, or any other tool within a Letta agent. This lets you combine Letta’s stateful architecture with specialized memory systems for specific use cases.

jamesdanielwhitford · October 29, 2025, 9:06am

Letta’s AI Memory SDK

The AI Memory SDK provides pluggable memory for developers who want memory capabilities without adopting a full stateful agent platform.

Note: The AI Memory SDK is experimental and actively developed. For production systems requiring sophisticated memory management, the full Letta Platform provides more control and reliability. The SDK works well for its intended use case (drop-in memory for existing apps), but it’s currently in a prototype phase.

Although it is separate from the Letta Platform, the AI Memory SDK was built by the same team and uses Letta agents (subagents) running under the hood on Letta Cloud for memory management. It provides the benefits of Letta’s memory handling in a lightweight package that integrates with your existing LLM setup, without requiring the adoption of full stateful agent infrastructure.

How the AI Memory SDK works

Like Mem0 or Zep, the AI Memory SDK enables you to initialize a memory instance, add conversations, and retrieve context when needed. The difference is that the AI Memory SDK runs background Letta agents (subconscious memory agents) that asynchronously process conversations between interactions rather than during them (during “sleeptime”).

These background agents automatically maintain two memory blocks per user: a conversation summary and a user profile. You retrieve this learned context and inject it into your prompts, similar to Mem0 or Zep, but memory formation happens through autonomous agents rather than direct LLM extraction or graph algorithms.

The background processing approach means memory updates don’t slow down your application’s responses. However, the memory you retrieve might not yet reflect the very latest conversation turns. This trade-off works well for most applications, though systems that need immediate access to every new fact (like transactional applications) would benefit from the full Letta Platform’s synchronous memory writes.

The AI Memory SDK depends on Letta Cloud to run its background memory agents. If you need to keep everything on your own infrastructure, the full Letta Platform offers self-hosting through Docker, while the SDK is built for applications with internet connectivity.

For more complex memory architectures—such as custom memory blocks, agents that modify their own memory, or persistent agent personalities—use the full Letta Platform.

Implementing the AI Memory SDK

Install the SDK:

pip install ai-memory-sdk

Set up environment variables:

export LETTA_API_KEY=your_letta_api_key
export OPENAI_API_KEY=your_openai_api_key

Set up a basic integration with OpenAI:

from openai import OpenAI
from ai_memory_sdk import Memory

# Initialize memory and OpenAI client
memory = Memory()
openai_client = OpenAI()

def chat_with_memory(message, user_id="default_user"):
    # Get or initialize user memory
    user_memory = memory.get_user_memory(user_id)
    if not user_memory:
        memory.initialize_user_memory(user_id, reset=True)

    # Retrieve formatted memory for prompt
    memory_prompt = memory.get_user_memory(user_id, prompt_formatted=True)

    # Build system prompt with memory context
    system_prompt = f"<system>You are a helpful AI assistant.</system>\n{memory_prompt}"

    # Create messages
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": message}
    ]

    # Get response from LLM
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    # Extract response and save to memory
    assistant_response = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_response})
    memory.add_messages(user_id, messages)

    return assistant_response

# Use it
response = chat_with_memory("Hi, I'm learning Python", user_id="alice")
print(response)

# Later conversation remembers context
response = chat_with_memory("What resources should I start with?", user_id="alice")
print(response)  # Agent remembers Alice is learning Python

Conduct a semantic search through conversation history:

# Search past conversations
results = memory.query(
    user_id="alice",
    query="What programming topics did we discuss?"
)

for result in results:
    print(f"Message: {result['content']}")
    print(f"Relevance: {result['score']}")
    print("---")

The AI Memory SDK automatically generates user summaries, maintains conversation history, and provides semantic search capabilities. It processes messages asynchronously, so memory updates happen in the background without blocking your application.

Choosing the right memory solution

Each memory approach excels in different scenarios:

Temporal graphs (like Zep) excel when facts change frequently—user preferences evolve, job titles change, locations shift. They maintain both current truth and historical context.
Semantic graphs (like Cognee) excel when relationships between concepts matter more than change over time. They’re ideal for understanding how ideas connect in technical documentation, research papers, or knowledge bases.
Stateful memory blocks (like Letta) excel when agents need to self-modify their knowledge and behavior. They’re best for agents that learn from experience and adapt their personalities over time.

Mem0 vs Letta

Mem0	AI Memory SDK	Letta Platform
Drop-in memory layer for stateless agents	Pluggable memory using background agents for intelligent processing	Stateful agents with self-modifying memory blocks
Reduces tokens by 90% Production-ready with a hybrid architecture (vector + key-value + graph)	More sophisticated than simple extraction	Memory lives in context window Shareable across agents

Choose Mem0 for minimal code changes to existing apps.
Choose the AI Memory SDK for agent-based processing without a full stateful architecture.
Choose Letta Platform for agents that evolve their own memory and collaborate through shared state.

Mem0 vs Zep

Mem0	Zep
Simple memory extraction and consolidation	Temporal knowledge graphs with bi-temporal modeling
No temporal tracking Updates overwrite or merge memories	Tracks when facts were valid and when they change

Choose Mem0 for straightforward user, session, or agent memory without history.
Choose Zep when facts evolve (like when location changes or preference shifts) and you need both current state and historical context, for example, “What did the user prefer in Q1 vs Q3?”

Mem0 vs Cognee

Mem0	Cognee
Memory extraction and retrieval	Semantic knowledge graphs with ontologies
Organizes into user/session/agent levels but doesn’t build relationship graphs	Understands concept relationships (Python → programming language → ML libraries)

Choose Mem0 for user preference memory and conversation context.
Choose Cognee when relationships between concepts matter as much as the facts (it’s ideal for technical docs, research papers, or complex knowledge bases).

Zep vs Cognee

Zep	Cognee
Temporal knowledge graphs track when facts are valid	Semantic knowledge graphs track how concepts relate
Bi-temporal modeling for changing information	Ontology-based relationships for domain understanding

Choose Zep when data changes over time and temporal history matters.
Choose Cognee when understanding concept relationships and dependencies is critical.

Zep vs Letta

Zep	AI Memory SDK	Letta Platform
External temporal knowledge graph that agents query	Simple drop-in memory without temporal tracking	Memory blocks in context window, directly editable by agents
Tracks validity periods for evolving facts	Agent-based processing	Continuous state operation

Choose Zep for enterprise apps needing temporal history and fact evolution tracking.
Choose the Letta AI Memory SDK for drop-in memory with intelligent processing.
Choose Letta Platform for stateful agents with shared memory blocks across multi-agent systems.

Cognee vs Letta

Cognee	AI Memory SDK	Letta Platform
External semantic knowledge graph	Drop-in memory with agent-based processing	Self-modifying memory blocks that agents edit directly
Retrieves information based on concept relationships	No knowledge graph construction	Enables continuous state evolution

Choose Cognee for semantic understanding of concept relationships.
Choose Letta Memory SDK for simpler integration without graph complexity.
Choose Letta Platform for agents that modify their own memory structure. (Note: You can use Cognee within Letta agents via custom tools/MCP servers.)

Quick comparison

Solution	Memory architecture	Best for
Mem0	Drop-in layer	Simple user/session memory
Zep	Temporal graph	Tracking changing facts over time
Cognee	Semantic graph	Understanding concept relationships
AI Memory SDK	Pluggable agents	Drop-in intelligent processing
Letta Platform	Stateful agents	Self-modifying, persistent agents

Combining solutions as you scale

These memory solutions aren’t mutually exclusive. Many production applications start simple and add sophistication as complexity grows. An example evolutionary path:

Begin with the AI Memory SDK for lightweight memory integration with minimal code changes
Graduate to the full Letta Platform when you need agents to modify their own behavior, maintain persistent personalities, or coordinate across multi-agent systems
Integrate specialized memory systems like Zep (for temporal queries) or Cognee (for semantic knowledge graphs) through Letta’s MCP support when you need domain-specific capabilities

This approach lets you avoid premature optimization while preserving the ability to scale memory sophistication alongside application requirements.

Interoperability: Letta Platform is fully interoperable with other memory systems. Letta agents can use Mem0, Zep, Cognee, or any other tool through custom Python tools or by connecting to MCP (Model Context Protocol) servers. You can combine Letta’s stateful architecture with specialized memory systems for specific use cases.

Topic		Replies	Views
Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee General	1	270	October 25, 2025
SKILL.md for Letta General	7	147	November 13, 2025
Testing out Ezra General ezra	12	59	January 26, 2026
Uploading memories to an agent General	0	178	July 23, 2025
Voice-First Refiner Agent: System Prompt, Memory Management, and Mode Switching Agent Design	0	15	February 11, 2026

Agent memory: Letta vs Mem0 vs Zep vs Cognee

Agent memory

How agent memory works

Drop-in memory layers

Temporal knowledge graphs

Semantic knowledge graphs

Stateful programming

Mem0: Universal memory layer for LLM applications

How Mem0 works

Multi-level memory: User, session, and agent state

Implementing Mem0 in your application

Zep: Temporal knowledge graphs for agent memory

How Zep works

Tracking change over time

Integrating Zep into your agent

title: Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee (Part 2)
slug: guides/education/agent-memory-comparison-part-2

Cognee: Semantic memory with knowledge graphs

How Cognee works

Building knowledge graphs

Adding Cognee to your agent

Letta: Memory as infrastructure

How Letta works

Core memory and archival memory

Memory blocks: Self-modifying agent state

Building with Letta

Letta’s AI Memory SDK

How the AI Memory SDK works

Implementing the AI Memory SDK

Choosing the right memory solution

Mem0 vs Letta

Mem0 vs Zep

Mem0 vs Cognee

Zep vs Cognee

Zep vs Letta

Cognee vs Letta

Quick comparison

Combining solutions as you scale

Further reading and resources

Official documentation

Source code

Agent memory: Letta vs Mem0 vs Zep vs Cognee

Agent memory

How agent memory works

Drop-in memory layers

Temporal knowledge graphs

Semantic knowledge graphs

Stateful programming

Mem0: Universal memory layer for LLM applications

How Mem0 works

Multi-level memory: User, session, and agent state

Implementing Mem0 in your application

Zep: Temporal knowledge graphs for agent memory

How Zep works

Tracking change over time

Integrating Zep into your agent

title: Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee (Part 2) slug: guides/education/agent-memory-comparison-part-2

Cognee: Semantic memory with knowledge graphs

How Cognee works

Building knowledge graphs

Adding Cognee to your agent

Letta: Memory as infrastructure

How Letta works

Core memory and archival memory

Memory blocks: Self-modifying agent state

Building with Letta

Letta’s AI Memory SDK

How the AI Memory SDK works

Implementing the AI Memory SDK

Choosing the right memory solution

Mem0 vs Letta

Mem0 vs Zep

Mem0 vs Cognee

Zep vs Cognee

Zep vs Letta

Cognee vs Letta

Quick comparison

Combining solutions as you scale

Further reading and resources

Official documentation

Source code

Related topics

title: Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee (Part 2)
slug: guides/education/agent-memory-comparison-part-2