Agent memory solutions: Letta vs Mem0 vs Zep vs Cognee

If you’re building AI agents, you’ve probably run into a fundamental limitation: LLMs forget everything the moment you close the conversation. This article explains how agent memory works and compares five popular solutions to the problem: Mem0, Zep, Cognee, Letta Platform, and AI Memory SDK.

Agent memory

When you call an LLM like GPT-4 or Claude, it only knows what’s in the current prompt. There’s no persistent memory of past conversations, user preferences, or learned behaviors. This stateless design works fine for one-off questions, but breaks down when building agents that need to maintain context across multiple interactions.

Agent memory solves this by adding persistent storage that agents can read from and write to. It transforms stateless LLMs into stateful agents that can remember user preferences, learn from past interactions, and provide personalized experiences.

The simplest form of memory is keeping the conversation history in the context window. But context windows have limits, making this solution expensive and slow. The diagram below shows the difference between stateless and stateful approaches. Stateless agents require manual prompt engineering to manage an unstructured context window, while stateful agents use structured memory blocks. A memory system like Letta automatically compiles context from persistent state storage into these organized blocks.

Agent memory typically involves two components:

  • Working memory is the information currently loaded in the agent’s context window, which functions similarly to your short-term memory during a conversation. It holds the immediate context needed to respond to the current request.
  • Long-term storage consists of external databases (vector stores, knowledge graphs, or traditional databases) that store information beyond what fits in the context. Agents retrieve relevant information from long-term storage and load it into working memory as needed.

The challenge is deciding what information to store, how to organize it, and when to retrieve it. Different memory solutions approach these decisions in different ways.

How agent memory works

Before diving into specific tools, we’ll look at the different architectural approaches to agent memory. Each solution in this article takes one (or a combination) of these approaches.

Drop-in memory layers

This approach treats memory as an external service that wraps around your existing stateless agent. You continue using your LLM as before, but prior to each request, the memory layer retrieves relevant context and injects it into the prompt. After each response, it extracts and stores important information.

Think of drop-in memory layers like adding a note-taking assistant that sits between you and your agent. The agent itself stays stateless, but the memory layer manages what to remember and when to surface it.

Tools like Mem0 use this approach. They provide APIs to store and retrieve memories, but your agent architecture stays largely the same.

Temporal knowledge graphs

Facts change. A user might move from San Francisco to New York, then back to San Francisco. Their job title might change. Their preferences might evolve. Most memory systems either overwrite old information or accumulate contradictions.

Temporal knowledge graphs solve the problem of lost or contradictory information by tracking when facts are valid. Instead of storing “User lives in San Francisco,” they store “User lived in San Francisco (valid: Jan 2023–May 2024),” and “User lives in New York (valid: May 2024–present).”

A temporal knowledge graph maps entities (users, places, and concepts) as nodes and relationships as edges with time metadata. When you query a graph, you can ask for current facts or historical context.

Zep uses this approach with its Graphiti engine, which builds temporal knowledge graphs from conversations and business data.

Semantic knowledge graphs

While temporal knowledge graphs focus on when facts are valid, semantic knowledge graphs focus on what facts mean and how they relate to one another. This approach uses ontologies (structured representations of domain knowledge) to understand the relationships between concepts.

Instead of just storing chunks of text in a vector database, semantic knowledge graphs build structured representations of how the text chunks relate to one another. When you mention “Python,” the system understands it relates to “programming languages,” “Guido van Rossum,” and “machine learning libraries.” These relationships enable more intelligent retrieval.

This approach combines vector search for semantic similarity with graph traversal for relationship-based queries. You can ask, “What programming languages has the user worked with?” and get answers that understand the conceptual relationships.

Cognee builds these semantic knowledge graphs on the fly during document processing and retrieval.

Stateful programming

The previous three approaches treat memory as a feature you add to stateless agents. But there’s a fundamentally different approach: designing agents to be stateful from the ground up.

In stateful programming, memory isn’t an add-on. The agent’s entire architecture revolves around maintaining and modifying state over time. Instead of retrieving memories before each request, the agent continuously operates with persistent state.

This is similar to the difference between functional programming (stateless) and object-oriented programming (stateful). In functional programming, functions don’t maintain state between calls. In object-oriented programming, objects maintain internal state that persists and evolves.

Stateful agents have memory blocks that they can read and write to directly. These memory blocks are part of the agent’s core architecture, not external storage accessed through APIs. The agent can decide what to remember, what to forget, and how to organize its own memory.

The Letta Platform uses this approach. Agents exist as persistent entities with self-modifying memory, not as stateless functions called on demand.

For developers who want memory capabilities without adopting the full stateful paradigm, pluggable memory solutions like Letta’s AI Memory SDK provide a middle ground. These are lightweight libraries that add memory to existing applications without requiring an architectural shift.

Mem0: Universal memory layer for LLM applications

Mem0 (pronounced “mem-zero”) is a drop-in memory layer that adds persistent memory to AI applications. It sits between your LLM and your application logic, automatically extracting and storing important information from conversations.

How Mem0 works

Mem0 uses a two-phase approach to manage memory.

  • Extraction phase: When you add messages to Mem0, it analyzes the conversation to identify important facts, preferences, and context. It uses an LLM (GPT-4 by default) to determine what’s worth remembering. This isn’t just storing raw messages. Mem0 distills conversations into structured memories like “User prefers vegetarian food” or “User works as a backend developer in Python.”

  • Update phase: After extraction, Mem0 intelligently updates its memory store using a hybrid architecture that combines vector databases (for semantic search), graph databases (for relationships), and key-value stores (for fast lookups). If new information contradicts existing memories, it updates them. If information is redundant, it consolidates. This prevents memory from growing unbounded with duplicate or outdated information. According to research by the Mem0 team, this approach achieves over 90% reduction in token costs compared to full-context methods, along with 26% better accuracy and 91% lower latency.

Mem0 Memory Architecture Diagram

Multi-level memory: User, session, and agent state

Mem0 organizes memories into three levels:

  • User memory stores long-term facts about individual users that persist across all sessions. This includes preferences, biographical information, and historical context. User memories are identified by a user_id.
  • Session memory stores temporary context for a specific conversation or task. Session memories are useful for maintaining context within a single interaction but can be discarded afterward. These use a session_id.
  • Agent memory stores information about the agent itself, its capabilities, learned behaviors, or instructions. This helps agents maintain consistent personas and improve over time. These use an agent_id.

Implementing Mem0 in your application

Install Mem0:

pip install mem0ai

Complete the basic setup and memory storage:

from mem0 import Memory

# Initialize memory
m = Memory()

# Add a conversation to memory
messages = [
    {"role": "user", "content": "Hi, I'm Alex. I'm a vegetarian."},
    {"role": "assistant", "content": "Hello Alex! I'll remember that you're vegetarian."},
    {"role": "user", "content": "I love Italian food."},
    {"role": "assistant", "content": "Great! I'll keep that in mind."}
]

# Store with user ID
m.add(messages, user_id="alex")

Retrieve memories:

# Get all memories for a user
all_memories = m.get_all(user_id="alex")
print(all_memories)
# Output: [
#   {"memory": "User is named Alex"},
#   {"memory": "User is vegetarian"},
#   {"memory": "User loves Italian food"}
# ]

# Search for specific information
results = m.search(
    query="What are Alex's food preferences?",
    user_id="alex"
)
print(results)
# Returns relevant memories ranked by relevance

Configure Mem0 with a vector store:

from mem0 import Memory

config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    }
}

m = Memory.from_config(config)

Mem0 works with OpenAI by default, but supports other LLM providers. It integrates with popular frameworks like LangChain, LangGraph, and CrewAI, making it easy to add memory to existing agent implementations.

Zep: Temporal knowledge graphs for agent memory

Zep is a context engineering platform built around temporal knowledge graphs. Unlike systems that simply store facts, Zep tracks how information changes over time and maintains historical context. This makes it particularly useful when facts evolve or contradict previous information.

How Zep works

At the core of Zep is Graphiti, an open-source library for building temporal knowledge graphs. When you add conversation data to Zep, Graphiti extracts entities (for example, people, places, and concepts) and relationships (for example, works at, lives in, and prefers) and stores them as a graph.

What makes this graph temporal is Zep’s bi-temporal model, which tracks two separate timelines:

  • Event time reflects when events actually happened in the real world.
  • Transaction time reflects when information was recorded in the system.

Each edge in the graph includes validity metadata. When a fact changes, Zep doesn’t overwrite the old information. Instead, it marks the old fact’s validity period as having ended and creates a new fact with its own validity period.

For example, if a user says, “I moved from Austin to Seattle,” Zep creates two location relationships with different time periods. When you ask, “Where does the user live?” you get Seattle. But you can also query historical information like, “Where did the user live in March 2024?” and get Austin.

Zep Temporal Memory Architecture Diagram

Zep uses a hybrid search approach, combining vector similarity (for semantic matching) and graph traversal (for relationship-based queries). This lets you ask questions like, “What companies has the user worked for?” and get answers that understand the relationships between entities.

Tracking change over time

The temporal aspect becomes valuable in real-world scenarios where information changes as:

  • User preferences evolve (for example, stopped being vegetarian or changed career paths)
  • Business data updates (for example, company acquisitions or org changes)
  • Relationships change (for example, manager changes or location changes)

Traditional memory systems handle these changes either by overwriting old data (losing history) or by accumulating contradictory facts (creating confusion). Temporal graphs maintain both current truth and historical context.

Integrating Zep into your agent

Install Graphiti:

pip install graphiti-core

Set up environment variables for your graph database:

export OPENAI_API_KEY=your_openai_api_key
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=password

Initialize Graphiti and build indices:

import os
import asyncio
from dotenv import load_dotenv
from datetime import datetime, timezone

from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType

load_dotenv()

neo4j_uri = os.environ.get('NEO4J_URI', 'bolt://localhost:7687')
neo4j_user = os.environ.get('NEO4J_USER', 'neo4j')
neo4j_password = os.environ.get('NEO4J_PASSWORD', 'password')

# Initialize with Neo4j
graphiti = Graphiti(
    uri=neo4j_uri,
    user=neo4j_user,
    password=neo4j_password
)

async def main():
    # Build indices (one-time setup)
    await graphiti.build_indices_and_constraints()

Add conversation data as episodes:

    # First conversation
    await graphiti.add_episode(
        name="User location update",
        episode_body="The user mentioned they live in Austin and work at TechCorp",
        source=EpisodeType.message,
        source_description="Chat message from user onboarding",
        reference_time=datetime.now(timezone.utc),
    )

    # Later conversation with updated information
    await graphiti.add_episode(
        name="User relocation",
        episode_body="The user said they moved to Seattle and started at NewCompany",
        source=EpisodeType.message,
        source_description="Chat message from follow-up",
        reference_time=datetime.now(timezone.utc),
    )

Query with temporal awareness:

    # Get current facts
    results = await graphiti.search(
        query="Where does the user live and where do they work?",
        num_results=5
    )

    for result in results:
        print(f"Fact: {result.fact}")
        print(f"Valid from: {result.valid_from}")
        if result.valid_until:
            print(f"Valid until: {result.valid_until}")
        print("---")

    # Output includes both historical and current facts with validity periods

Zep also offers a cloud-hosted version if you don’t want to manage your own graph database infrastructure. The Graphiti library supports Neo4j, Amazon Neptune, FalkorDB, and Kuzu as database backends.

Cognee: Semantic memory with knowledge graphs

Cognee is an open-source library that builds semantic knowledge graphs from your data. It’s designed to provide sophisticated semantic understanding through knowledge graph construction.

How Cognee works

Traditional retrieval-augmented generation (RAG) systems embed documents into vectors and retrieve chunks based on similarity. This approach works, but it overlooks the relationships between concepts.

Cognee takes a different approach. When you add data, it builds a knowledge graph that captures entities and their relationships. The graph uses ontologies to understand how concepts relate to one another.

For example, if your documents mention “Python,” “machine learning,” and “scikit-learn,” Cognee doesn’t just store these keywords as separate chunks. It understands that Python is a programming language, machine learning is a field, and scikit-learn is a Python library for machine learning. These relationships enable smarter retrieval.

Cognee Semantic Architecture Diagram

Building knowledge graphs

Cognee combines two storage layers:

  • A vector database for semantic similarity search, which supports Weaviate, Qdrant, LanceDB, and others.

  • A graph database for relationship modeling, which supports Neo4j and Kuzu.

When you query Cognee, it searches both the vector database for semantically similar content and the graph database for related concepts. This hybrid approach lets you ask questions like “What Python libraries are mentioned?” and get answers that understand Python as a programming language context.

The knowledge graph is built automatically during the “cognify” process, which analyzes your data, extracts entities and relationships, and structures them according to domain ontologies.

Adding Cognee to your agent

Install Cognee:

pip install cognee

Run the basic 5-line implementation:

import cognee
import asyncio

async def main():
    # Add data to Cognee
    await cognee.add("Natural language processing (NLP) is an interdisciplinary subfield of computer science.")

    # Process into knowledge graph
    await cognee.cognify()

    # Query with semantic understanding
    result = await cognee.search(query_text="Tell me about NLP")
    print(result)

asyncio.run(main())

Work with multiple documents:

import cognee
import asyncio

async def main():
    # Add related documents
    documents = [
        "GPT-4 is a large language model developed by OpenAI.",
        "OpenAI was founded in 2015 by Sam Altman, Elon Musk, and others.",
        "Sam Altman is the CEO of OpenAI and formerly led Y Combinator."
    ]

    await cognee.add(documents)
    await cognee.cognify()

    # Query understanding relationships
    result = await cognee.search(query_text="Who leads OpenAI?")
    print(result)
    # Returns answer understanding the relationships between
    # Sam Altman, CEO role, and OpenAI

asyncio.run(main())

Lastly, configure database backends using environment variables, typically stored in a .env file in your project’s root directory. The library automatically loads these variables on startup.

Here is an example .env file for connecting to Qdrant and Neo4j:

LLM_PROVIDER="openai"
LLM_MODEL="gpt-4"
LLM_API_KEY="your_openai_api_key"

VECTOR_DB_PROVIDER="qdrant"
# QDRANT_URL="http://localhost:6333" # Optional: for non-default host

GRAPH_DB_PROVIDER="neo4j"
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="password"

Cognee integrates with agent frameworks like LangChain and CrewAI, making it easy to add semantic memory to existing agent implementations. The knowledge graph construction is automatic, so you don’t need to manually define schemas or relationships.

Letta: Memory as infrastructure

Letta takes a fundamentally different approach to the other solutions in this article. While Mem0, Zep, and Cognee add memory capabilities to stateless agents, Letta is a general-purpose memory programming system for building stateful agents that have memory embedded into their core architecture.

How Letta works

The Letta Platform treats agents as persistent entities with continuous state, not as functions called on demand. When you create a Letta agent, it exists on the server and maintains state even when your application isn’t running. There are no sessions or conversation threads. Each agent has a single, perpetual existence.

This fundamentally changes how you think about agent development. Instead of fetching relevant memories before each LLM call, the agent continuously operates with its memory state. Memory isn’t something you manage externally. It’s part of the agent’s core architecture.

Letta Stateful Agents Diagram

Letta was built by the researchers behind MemGPT, which pioneered the concept of treating LLM context management similar to how an operating system manages memory. The platform provides sophisticated memory management that lets agents learn how to use tools, evolve personalities, and improve their capabilities over time.

Core memory and archival memory

Letta implements a two-tier memory architecture:

  • The core memory lives inside the agent’s context window and is always available. This is the most important component for making your agent maintain consistent state across turns. Core memory is organized into memory blocks that the agent can read and write to directly, holding the agent’s most critical, frequently-accessed state.
  • The archival memory provides long-term retrieval for information outside the context window. When an agent needs information from archival memory, it retrieves the relevant pieces and loads them into core memory. Letta also supports file-based retrieval through the Letta Filesystem.

This hierarchy is similar to the way operating systems manage RAM and disk storage. Hot data stays in RAM (core memory), while cold data lives on disk (archival memory) and is paged in when needed.

Memory blocks: Self-modifying agent state

Memory blocks are the fundamental units of Letta’s memory system. Each block is a structured piece of information that the agent can modify. Unlike external memory systems, where you programmatically update memories through APIs, Letta agents can directly edit their own memory blocks.

For example, a Letta agent might have three memory blocks for storing different data.

  • Human Block: Information about the user (such as name, preferences, and history)
  • Persona Block: The agent’s identity and behavior guidelines
  • Custom Blocks: Domain-specific state (such as project context and code knowledge)

These blocks are editable by the agent itself, other agents, or developers through the API. This enables agents to learn and adapt their behavior based on experience.

Memory blocks can also be shared between multiple agents. By attaching the same block to different agents, you can create powerful multi-agent systems where agents collaborate through shared memory. For example, multiple agents working on a project could share an “organization” or “project context” block, allowing them to maintain synchronized state and share learned information.

Building with Letta

Install Letta:

# For local server development
# The sqlite-vec extension is loaded automatically by the local server but may need to be installed if not already installed
pip install letta

# For client SDK only (connecting to Letta Cloud or existing server)
pip install letta-client

Set up environment variables:

export OPENAI_API_KEY=your_openai_api_key

Start the local server:

letta server

The Letta platform includes a visual Agent Development Environment (ADE) that lets you:

  • Create and configure agents through a UI
  • View agent memory blocks in real-time
  • Monitor the agent’s decision-making process
  • Test and debug agent behavior

Create an agent with the Python SDK (on the local server):

from letta_client import Letta

# Connect to local Letta server
client = Letta(base_url="http://localhost:8283")

# Create agent with memory blocks
agent = client.agents.create(
    memory_blocks=[
        {
            "label": "human",
            "value": "Name: Sarah. Role: Senior Backend Engineer. Preferences: Prefers Python over JavaScript, likes concise explanations."
        },
        {
            "label": "persona",
            "value": "I am a helpful coding assistant. I adapt my explanations to the user's expertise level and communication style."
        }
    ],
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

print(f"Created agent: {agent.id}")

Chat with the agent:

# Send message to stateful agent
response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[
        {"role": "user", "content": "Can you help me optimize this SQL query?"}
    ]
)

# Agent responds with context from its memory blocks
for message in response.messages:
    if hasattr(message, 'reasoning'):
        print(f"Reasoning: {message.reasoning}")
    if hasattr(message, 'content'):
        print(f"Content: {message.content}")

Update memory blocks programmatically:

# Retrieve a specific memory block by label
human_block = client.agents.blocks.retrieve(
    agent_id=agent.id,
    block_label="human"
)

# Update the block with new information
client.blocks.modify(
    block_id=human_block.id,
    value="Name: Sarah. Role: Senior Backend Engineer. Preferences: Prefers Python over JavaScript, likes concise explanations. Currently working on: Microservices migration project."
)

Use Letta Cloud (the hosted platform):

from letta_client import Letta

# Connect to Letta Cloud
client = Letta(token="YOUR_LETTA_API_KEY")

# Create agent (same API, hosted infrastructure)
agent = client.agents.create(
    model="openai/gpt-4.1",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "human", "value": "The human's name is Alex."},
        {"label": "persona", "value": "I am a helpful assistant."}
    ]
)

Create shared memory blocks for multi-agent systems:

# Create a shared memory block
shared_block = client.blocks.create(
    label="project_context",
    description="Shared knowledge about the current project",
    value="Project: Building a chat application. Tech stack: Python, FastAPI, React. Timeline: 3 months.",
    limit=4000
)

# Create multiple agents that share this block
agent1 = client.agents.create(
    memory_blocks=[
        {"label": "persona", "value": "I am a backend specialist."}
    ],
    block_ids=[shared_block.id],  # Attach shared block
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

agent2 = client.agents.create(
    memory_blocks=[
        {"label": "persona", "value": "I am a frontend specialist."}
    ],
    block_ids=[shared_block.id],  # Both agents share this block
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small"
)

# Both agents can now access and modify the shared project context

The key distinction is that Letta agents are stateful by design. The memory isn’t managed separately from the agent. It’s integral to how the agent operates. This enables more sophisticated agent behaviors, like self-improvement, personality evolution, and complex multi-turn interactions where the agent truly learns from experience.

Letta is also fully interoperable with other memory solutions. Through custom Python tools or Model Context Protocol (MCP) servers, you can use Mem0, Zep, Cognee, or any other tool within a Letta agent. This lets you combine Letta’s stateful architecture with specialized memory systems for specific use cases.

Letta’s AI Memory SDK

The AI Memory SDK provides pluggable memory for developers who want memory capabilities without adopting a full stateful agent platform. Although it is separate from the Letta Platform, the AI Memory SDK was built by the same team and uses Letta agents (subagents) running under the hood on Letta Cloud for memory management. It provides the benefits of Letta’s memory handling in a lightweight package that integrates with your existing LLM setup, without requiring the adoption of full stateful agent infrastructure.

How the AI Memory SDK works

Like Mem0 or Zep, the AI Memory SDK enables you to initialize a memory instance, add conversations, and retrieve context when needed. The difference is that the AI Memory SDK runs background Letta agents (subconscious memory agents) that asynchronously process conversations between interactions rather than during them (during “sleeptime”).

These background agents automatically maintain two memory blocks per user: a conversation summary and a user profile. You retrieve this learned context and inject it into your prompts, similar to Mem0 or Zep, but memory formation happens through autonomous agents rather than direct LLM extraction or graph algorithms.

For more complex memory architectures — such as custom memory blocks, agents that modify their own memory, or persistent agent personalities — use the full Letta Platform.

Implementing the AI Memory SDK

Install the SDK:

pip install ai-memory-sdk

Set up environment variables:

export LETTA_API_KEY=your_letta_api_key
export OPENAI_API_KEY=your_openai_api_key

Set up a basic integration with OpenAI:

from openai import OpenAI
from ai_memory_sdk import Memory

# Initialize memory and OpenAI client
memory = Memory()
openai_client = OpenAI()

def chat_with_memory(message, user_id="default_user"):
    # Get or initialize user memory
    user_memory = memory.get_user_memory(user_id)
    if not user_memory:
        memory.initialize_user_memory(user_id, reset=True)

    # Retrieve formatted memory for prompt
    memory_prompt = memory.get_user_memory(user_id, prompt_formatted=True)

    # Build system prompt with memory context
    system_prompt = f"<system>You are a helpful AI assistant.</system>\n{memory_prompt}"

    # Create messages
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": message}
    ]

    # Get response from LLM
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    # Extract response and save to memory
    assistant_response = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_response})
    memory.add_messages(user_id, messages)

    return assistant_response

# Use it
response = chat_with_memory("Hi, I'm learning Python", user_id="alice")
print(response)

# Later conversation remembers context
response = chat_with_memory("What resources should I start with?", user_id="alice")
print(response)  # Agent remembers Alice is learning Python

Conduct a semantic search through conversation history:

# Search past conversations
results = memory.query(
    user_id="alice",
    query="What programming topics did we discuss?"
)

for result in results:
    print(f"Message: {result['content']}")
    print(f"Relevance: {result['score']}")
    print("---")

The AI Memory SDK automatically generates user summaries, maintains conversation history, and provides semantic search capabilities. It processes messages asynchronously, so memory updates happen in the background without blocking your application.

The SDK is experimental and actively developed. The roadmap includes TypeScript support, learning from files, and enhanced memory management features.

Excellent comprehensive comparison. The Letta-specific sections are accurate and well-explained.

A few observations from supporting users across these different approaches:

Architecture Choice Matters

The stateless vs stateful distinction you highlighted is the critical decision point. I’ve seen users try to retrofit stateful behavior onto stateless architectures (or vice versa), and it creates friction. Pick the paradigm that matches your use case from the start.

Shared Memory Blocks in Practice

The shared memory block pattern you demonstrated is one of Letta’s most powerful but underused features. Real-world multi-agent systems benefit enormously from this - but a heads-up: concurrent writes can create race conditions. memory_insert (append operations) is most robust for shared blocks. memory_replace can fail if the target string changes between read and write. memory_rethink is last-writer-wins.

AI Memory SDK Status

You correctly noted it’s experimental. Cameron (Letta team) recently mentioned the SDK is “in kind of a weird spot, it’s kind of a prototype.” It works well for its intended use case (drop-in memory for existing apps), but for production systems requiring sophisticated memory management, the full Letta Platform provides more control and reliability.

Temporal vs Semantic Graphs

Your distinction between Zep’s temporal graphs and Cognee’s semantic graphs is spot-on. From support experience, I’ve noticed:

  • Temporal graphs excel when facts change frequently (user preferences, job changes, locations)
  • Semantic graphs excel when relationships between concepts matter more than change over time
  • Letta’s memory blocks work best when agents need to self-modify their knowledge and behavior

Minor Update

The memory block uniqueness section - worth emphasizing that block labels are NOT unique identifiers in Letta. Multiple blocks can share the same label. When attaching blocks to agents via API, you need block_ids, not labels. Use the List Blocks API with label_search to find block IDs by label name.

This article would be an excellent addition to community resources. It covers the architectural tradeoffs more clearly than most vendor documentation.

1 Like