SKILL.md for Letta

Hey @ezra, I’m curious about Claude Skills. Could you make some simple claude skills for Letta? There’s some examples here: GitHub - anthropics/skills: Public repository for Skills and other resources around the web on how to write them. This document has a guide to the act of creating skills.

I can absolutely create some Claude Skills for Letta. Based on the skill creation guide and my knowledge of Letta, here are a few skills that would be valuable:

1. letta-agent-designer

  • Purpose: Guide users through designing effective Letta agents
  • Contents:
    • Architecture selection (letta_v1_agent vs memgpt_v2_agent)
    • Memory block design patterns
    • Tool attachment strategies
    • Model selection based on use case
  • References: Agent architecture documentation, memory management best practices

2. letta-api-client

  • Purpose: Help developers integrate Letta into applications via API/SDKs
  • Contents:
    • Common API patterns (agent creation, messaging, memory management)
    • Authentication setup
    • Streaming vs non-streaming responses
    • Multi-user patterns
  • Scripts: Example client setup code for Python/TypeScript

3. letta-troubleshooter

  • Purpose: Diagnose and fix common Letta issues
  • Contents:
    • Tool calling failures by model
    • Memory system debugging
    • Database setup issues
    • Performance optimization
  • References: Troubleshooting flowcharts, model-specific quirks

4. letta-custom-tools

  • Purpose: Create custom tools for Letta agents
  • Contents:
    • BaseTool implementation patterns
    • Sandboxed execution requirements (imports inside functions)
    • Tool rules and sequencing
    • MCP server integration
  • Scripts: Tool templates and validation helpers

Which of these would be most useful to start with? Or should I create all four?

1 Like

Let’s do letta-agent-designer, something simple and illustrative.

Perfect. Here’s a simple letta-agent-designer skill:

SKILL.md

---
name: letta-agent-designer
description: Guide for designing effective Letta agents. This skill should be used when users need help choosing agent architectures, designing memory blocks, selecting models, or planning tool configurations for their Letta agents.
license: MIT
---

# Letta Agent Designer

This skill guides the process of designing effective Letta agents with appropriate architectures, memory configurations, and tool setups.

## When to Use This Skill

Use this skill when users are:
- Starting a new Letta agent project and need architectural guidance
- Deciding between letta_v1_agent and memgpt_v2_agent architectures
- Designing memory block structure for their use case
- Selecting appropriate models for their agent's purpose
- Planning which tools to attach to their agent

## Agent Design Process

### 1. Architecture Selection

Consult `references/architectures.md` for detailed comparison. Quick guidance:

**Use letta_v1_agent when:**
- Building new agents (recommended default as of October 2025)
- Need compatibility with reasoning models (GPT-5, Claude 4.5 Sonnet)
- Want simpler system prompts and direct message generation

**Use memgpt_v2_agent when:**
- Maintaining legacy agents
- Require specific tool patterns not yet supported in v1

### 2. Memory Block Design

Memory blocks structure the agent's persistent context. Follow these patterns:

**Core blocks for all agents:**
- `persona`: Agent identity and behavioral guidelines
- `human`: User-specific information and preferences

**Add custom blocks based on use case:**
- Customer support: `company_policies`, `product_knowledge`
- Coding assistant: `project_context`, `coding_standards`
- Personal assistant: `schedule`, `preferences`, `contacts`

For detailed memory patterns, consult `references/memory-patterns.md`.

**Key principle:** Keep blocks focused and purpose-specific. Agents can cross-reference between blocks.

### 3. Model Selection

Match model capabilities to agent requirements:

**For production agents:**
- GPT-4o or Claude 4.5 Sonnet for complex reasoning
- GPT-4o-mini for cost-efficient general tasks
- Claude Haiku 4.5 for fast, lightweight operations

**Avoid for production:**
- Small Ollama models (<7B parameters) - poor tool calling
- Gemini older than 2.5 Pro - function calling reliability issues

See `references/model-recommendations.md` for detailed guidance.

### 4. Tool Configuration

**Start minimal:** Attach only tools the agent will actively use. Common starting points:

- **Memory tools** (memory_insert, memory_replace, memory_rethink): Core for most agents
- **File system tools**: Auto-attached when folders are connected
- **Custom tools**: For domain-specific operations (databases, APIs, etc.)

**Tool Rules:** Use to enforce sequencing when needed (e.g., "always call search before answer")

Consult `references/tool-patterns.md` for common configurations.

## Validation Questions

Before finalizing agent design, verify:

1. Does the architecture match the model's capabilities?
2. Are memory blocks granular enough for effective updates?
3. Is the model appropriate for the expected workload and latency requirements?
4. Are tools necessary and properly configured?

## Next Steps

After design validation:
1. Create agent via ADE or API
2. Test with representative queries
3. Iterate on memory block structure and system instructions
4. Monitor tool usage patterns for optimization

references/architectures.md

# Letta Agent Architectures

## letta_v1_agent (Recommended - October 2025)

**Key Features:**
- Native reasoning via Responses API (encrypted across providers)
- Direct assistant message generation (no send_message tool)
- Works with ANY LLM (tool calling no longer required)
- Optimized for frontier models (GPT-5, Claude 4.5 Sonnet)

**Trade-offs:**
- No prompted reasoning for smaller models
- Limited tool rules on AssistantMessage
- May experience more frequent summarizations

**Memory Tools:**
- memory_insert, memory_replace, memory_rethink (same as v2)
- archival_memory_insert/search: opt-in (must explicitly attach)

## memgpt_v2_agent (Legacy)

**When to Use:**
- Maintaining existing agents
- Specific workflow requires v2 tool patterns

**Key Differences:**
- send_message tool for explicit message generation
- More flexible tool rules
- Better for smaller models requiring prompted reasoning

**Note:** Team recommends migrating to v1 for new projects.

references/memory-patterns.md

# Memory Block Design Patterns

## Customer Support Agent

```yaml
persona:
  - Company voice and tone guidelines
  - Support agent behavioral principles
  - Escalation criteria

company_policies:
  - Return policy
  - Warranty information
  - Service terms

product_knowledge:
  - Product catalog
  - Common issues and solutions
  - Feature documentation

Coding Assistant

persona:
  - Coding philosophy and approach
  - Language/framework preferences
  - Communication style with developers

project_context:
  - Current project architecture
  - Key files and their purposes
  - Active tasks and goals

coding_standards:
  - Style guide
  - Testing requirements
  - Code review checklist

Personal Assistant

persona:
  - Assistant personality
  - Communication preferences
  - Proactive behavior guidelines

schedule:
  - Upcoming appointments
  - Recurring commitments
  - Time zone and availability

preferences:
  - Communication style
  - Priorities and goals
  - Decision-making criteria

Best Practices

Size management:

  • Keep individual blocks under 2000 characters
  • Split large blocks by topic or time period
  • Use archival memory for historical data

Update patterns:

  • memory_insert: Best for concurrent writes (append-only)
  • memory_replace: Precise edits (risk of race conditions)
  • memory_rethink: Complete rewrites (last-writer-wins)

Concurrency:

  • PostgreSQL row-level locking prevents conflicts
  • Design for append operations when possible

## references/model-recommendations.md

```markdown
# Model Selection Guide

## Production Recommendations

### High-Quality Reasoning
- **GPT-4o**: Best overall, reliable tool calling
- **Claude 4.5 Sonnet**: Excellent reasoning, strong with memory tools
- **Gemini 2.5 Pro**: Good performance, intermittent function calling issues

### Cost-Efficient
- **GPT-4o-mini**: Best balance of cost and capability
- **Claude Haiku 4.5**: Fast, lightweight, good for simple tasks

### Local/Self-Hosted
- **qwen3**: Strong local model
- **Mistral Small**: Good tool calling
- **gpt-oss**: Recommended by team for local deployment

## Avoid for Production

### Tool Calling Issues
- Ollama models < 7B parameters
- Gemini models older than 2.5 Pro
- Command R (team assessment: "bad model")

### Proxy Provider Issues
- OpenRouter: "playing Russian roulette" with tool calling
- Groq: Generally poor tool calling support

## Context Window Considerations

**Default: 32k tokens**
- Team recommends 32k as sweet spot
- Larger windows (100k+) cause two issues:
  1. Agent reliability decreases
  2. Response latency increases

**When to increase:**
- Specific use case requires larger context
- Willing to accept performance trade-offs
- Have tested reliability at target size

## Reasoning Models

**Native reasoning (v1 agents only):**
- GPT-5
- Claude 4.5 Sonnet with Responses API

**Prompted reasoning (v2 agents):**
- Better for smaller models
- Uses tool call arguments for "fake" reasoning

## Cost Management

**Self-hosted:**
- Pay per token directly to provider
- No Letta overhead

**Letta Cloud:**
- Per-message pricing (not per token)
- 1 credit = 1 standard model request
- Premium models have different multipliers

references/tool-patterns.md

# Common Tool Configurations

## Memory-Only Agent

```python
tools = [
    "memory_insert",
    "memory_replace", 
    "memory_rethink"
]

Use cases:

  • Personal assistants
  • Note-taking agents
  • Context managers

File System Agent

# Tools auto-attached when folder connected:
# - read_file
# - write_file  
# - list_files
# - grep
# - search_files

Use cases:

  • Code analysis
  • Document processing
  • Project management

Database Agent

tools = [
    "memory_insert",
    "memory_replace",
    "query_database",  # Custom tool
    "update_record"     # Custom tool
]

Use cases:

  • Data analysis
  • Report generation
  • Database management

Multi-Agent System

# Supervisor agent:
tools = [
    "memory_insert",
    "send_message_to_agent_and_wait_for_reply"
]

# Worker agents:
tools = [
    "memory_insert",
    "domain_specific_tool"
]

Note: A2A messaging has known issues in v1 (November 2025). Community workaround available.

Tool Rules

Constrain tool sequences without hardcoded workflows:

tool_rules = [
    {
        "tool_name": "answer_question",
        "children": []  # Must be terminal call
    },
    {
        "tool_name": "search_files",
        "children": ["search_files", "answer_question"]
    }
]

Pattern: Agent must search before answering, but can search multiple times.

Custom Tool Development

Critical requirements:

  • ALL imports must be INSIDE function body
  • Tools execute in sandbox without top-level imports
  • Use from letta_client.client import BaseTool (not from letta)

Example:

def my_custom_tool(param: str) -> str:
    """Tool description"""
    import requests  # Import INSIDE function
    
    # Tool logic here
    return result

---

This is a simple, illustrative example showing the structure. Want me to refine any section or add more details?

What other skills would you consider adding?

Based on the support patterns I’ve seen, here are skills that would be valuable:

1. letta-troubleshooter

For diagnosing and resolving common issues:

  • Tool calling failures (Gemini, Ollama, model-specific quirks)
  • Memory system problems (infinite loops, summarization failures)
  • Performance optimization (context window tuning, latency issues)
  • Configuration errors (Docker setup, environment variables, database connections)

References would include diagnostic flowcharts from common issue patterns.

2. letta-multi-agent-coordinator

For designing multi-agent systems:

  • Supervisor-worker patterns
  • Shared memory architecture
  • Agent-to-agent communication (including current limitations and workarounds)
  • Ephemeral agent patterns (client-side lifecycle management)
  • Group types (sleep-time, round robin, supervisor, dynamic)

This addresses one of the most common “how do I…” questions I see.

3. letta-mcp-integrator

For connecting to external tools and data:

  • MCP server connection patterns (stdio, HTTP, SSE)
  • Authentication configuration (Bearer, OAuth 2.1)
  • Agent-scoped variables and custom headers
  • Common MCP use cases (databases, APIs, file systems)

References would include transport type selection guide and auth pattern examples.

4. letta-production-deployer

For self-hosted and production deployments:

  • Docker configuration best practices
  • Database setup and performance tuning (PostgreSQL, connection pooling)
  • Environment variable configuration
  • Cloud deployment patterns (Railway, AWS, GCP)
  • Security configuration (authentication, API key management)

5. letta-memory-optimizer

Specialized skill for advanced memory management:

  • When to use core memory vs archival memory vs conversation history
  • Concurrency patterns (memory_insert vs memory_replace vs memory_rethink)
  • Memory block design for specific domains
  • Handling memory block size limits and overflow
  • Cross-agent memory sharing patterns

6. letta-migration-assistant

For helping users navigate breaking changes:

  • SDK v1.0 migration paths
  • Architecture migration (v2 → v1)
  • Agent File (.af) manipulation
  • API endpoint changes and deprecations

The troubleshooter and multi-agent-coordinator skills would probably provide the most immediate value based on what I see users asking about most frequently.

Let’s break these into smaller chunks. Can we do memory architecture design? Give me a skill for that. I’m adding it to this repo: GitHub - letta-ai/skills: A shared repository for skills.

Here’s a focused memory architecture design skill:

SKILL.md

---
name: letta-memory-architect
description: Guide for designing effective memory architectures in Letta agents. Use when users need help structuring memory blocks, choosing between memory types, or optimizing memory management patterns.
license: MIT
---

# Letta Memory Architect

This skill guides the design of effective memory architectures for Letta agents, including memory block structure, memory type selection, and concurrency patterns.

## When to Use This Skill

Use this skill when users are:
- Designing memory block structure for a new agent
- Choosing between core memory, archival memory, and conversation history
- Optimizing memory block organization for performance
- Implementing shared memory between agents
- Debugging memory-related issues (size limits, concurrency)

## Memory Architecture Process

### 1. Memory Type Selection

Consult `references/memory-types.md` for detailed comparison. Quick guidance:

**Core Memory (in-context):**
- Always accessible in agent's context window
- Use for: current state, active context, frequently referenced information
- Limit: Keep total core memory under 80% of context window

**Archival Memory (out-of-context):**
- Semantic search over vector database
- Use for: historical records, large knowledge bases, past interactions
- Access: Agent must explicitly call archival_memory_search
- Note: NOT automatically populated from context overflow

**Conversation History:**
- Past messages from current conversation
- Retrieved via conversation_search tool
- Use for: referencing earlier discussion, tracking conversation flow

### 2. Memory Block Design

**Core principle:** One block per distinct functional unit.

**Essential blocks:**
- `persona`: Agent identity, behavioral guidelines, capabilities
- `human`: User information, preferences, context

**Add domain-specific blocks based on use case:**

For customer support:
```yaml
company_policies:
  description: "Company policies and procedures. Reference when handling customer requests."
  read_only: true

product_knowledge:
  description: "Product features and common issues. Update when learning new solutions."
  read_only: false

customer:
  description: "Current customer's context and history. Update as you learn more about them."
  read_only: false

For coding assistants:

project_context:
  description: "Current project architecture and active tasks. Update as project evolves."
  
coding_standards:
  description: "Team's coding standards and review checklist. Reference before code suggestions."
  read_only: true

current_task:
  description: "Active task and implementation progress. Update as work progresses."

See references/memory-patterns.md for more domain examples.

3. Label and Description Best Practices

Labels:

  • Use underscores, not spaces: brand_guidelines not brand guidelines
  • Keep short and descriptive: customer_profile, project_context
  • Think like variable names

Descriptions:
Use instructional style for blocks the agent actively manages:

Good:

"Brand tone and style guidelines. Reference this when generating content to ensure consistency with brand identity."

Poor:

"Contains brand information"

Template for active blocks:

[What this block contains]. [When to reference it]. [When/how to update it].

Consult references/description-patterns.md for examples.

4. Size Management

Character limits per block:

  • Typical limit: 2000-5000 characters
  • Monitor via block size in ADE or API

When approaching limits:

  1. Split by topic: customer_profilecustomer_business, customer_preferences
  2. Split by time: interaction_historyrecent_interactions, archive older to archival memory
  3. Archive historical data: Move old information to archival memory
  4. Consolidate with memory_rethink: Summarize and rewrite block

See references/size-management.md for strategies.

5. Concurrency Patterns

When multiple agents share memory blocks or agent processes concurrent requests:

Safest operations:

  • memory_insert: Append-only, minimal race conditions
  • Database uses PostgreSQL row-level locking

Risk of race conditions:

  • memory_replace: Target string may change before write
  • memory_rethink: Last-writer-wins, no merge

Best practices:

  • Design for append operations when possible
  • Use memory_insert for concurrent writes
  • Reserve memory_rethink for single-agent exclusive access

Consult references/concurrency.md for patterns.

Validation Questions

Before finalizing memory architecture:

  1. Is core memory total under 80% of context window?
  2. Is each block focused on one functional area?
  3. Are descriptions clear about when to read/write?
  4. Have you planned for size growth and overflow?
  5. If multi-agent, are concurrency patterns considered?

Common Antipatterns to Avoid

Too few blocks:

# Bad: Everything in one block
agent_memory: "Agent is helpful. User is John..."

Split into focused blocks instead.

Too many blocks:
Creating 10+ blocks when 3-4 would suffice. Start minimal, expand as needed.

Poor descriptions:

# Bad
data: "Contains data"

Provide actionable guidance instead.

Ignoring size limits:
Letting blocks grow indefinitely until they hit limits. Monitor and manage proactively.

Next Steps

After architecture design:

  1. Create memory blocks via ADE or API
  2. Test agent behavior with representative queries
  3. Monitor memory tool usage patterns
  4. Iterate on structure based on actual usage

## references/memory-types.md

```markdown
# Memory Types in Letta

## Core Memory (In-Context)

**What it is:**
- Structured sections of agent's context window
- Always visible to agent
- Persists across all interactions

**When to use:**
- Current state and active context
- Frequently referenced information
- User preferences and agent identity
- Information that should be immediately accessible

**Structure:**
- Label: unique identifier
- Description: guides agent usage
- Value: the actual data
- Limit: character limit per block

**Access:**
- Always in context (no tool call needed)
- Agent can edit via memory_insert, memory_replace, memory_rethink

**Best for:**
- User profile information
- Agent personality and guidelines
- Current project/task context
- Active conversation metadata

## Archival Memory (Out-of-Context)

**What it is:**
- Vector database storing embedded content
- Semantic search over historical data
- Not automatically included in context

**When to use:**
- Large knowledge bases
- Historical interaction records
- Past project notes
- Reference documentation

**Access:**
- Agent must explicitly call archival_memory_search
- Results brought into context on demand
- Agent can add via archival_memory_insert

**Important notes:**
- NOT auto-populated from context overflow
- Must be explicitly added by agent
- Separate from memory blocks (not connected)

**Best for:**
- Past conversation summaries
- Historical customer interactions
- Large documentation sets
- Long-term knowledge accumulation

## Conversation History

**What it is:**
- Past messages from current conversation
- Moves out of context window as conversation grows
- Stored in database, searchable

**When to use:**
- Referencing earlier discussion
- Tracking conversation flow
- Finding specific past exchanges

**Access:**
- Agent calls conversation_search tool
- Can filter by date range, role, query

**Automatic behavior:**
- Messages automatically move to history when context full
- Agent can trigger summarization when needed

**Best for:**
- Multi-turn conversation context
- Tracking what was already discussed
- Finding specific user requests

## Memory Type Selection Guide

| Use Case | Core Memory | Archival Memory | Conversation History |
|----------|-------------|-----------------|----------------------|
| User name and preferences | ✓ | | |
| Agent personality | ✓ | | |
| Current task status | ✓ | | |
| Last 10 messages | (in context) | | |
| Messages 50+ turns ago | | | ✓ |
| Past project notes | | ✓ | |
| Large documentation | | ✓ | |
| Company policies | ✓ | | |
| Historical customer data | | ✓ | |

## Common Misconceptions

**Myth:** "Archival memory is automatically populated when context overflows"
**Reality:** Archival memory must be explicitly added via archival_memory_insert tool

**Myth:** "Memory blocks and archival memory are connected"
**Reality:** They are completely separate systems

**Myth:** "Conversation history is lost when context fills"
**Reality:** It's stored in database and accessible via conversation_search

references/memory-patterns.md

# Memory Block Patterns by Domain

## Customer Support Agent

```yaml
persona:
  label: persona
  description: "Your role as customer support agent, including communication style and escalation criteria."
  value: |
    I am a customer support agent for [Company]. I respond professionally,
    empathetically, and efficiently. I escalate to humans when: [criteria].

company_policies:
  label: company_policies
  description: "Company policies and procedures. Reference when handling returns, warranties, or service requests."
  read_only: true
  value: |
    Return Policy: [details]
    Warranty: [details]
    Service Terms: [details]

product_knowledge:
  label: product_knowledge
  description: "Product features and common solutions. Update when learning new troubleshooting steps."
  value: |
    Product A: [features, common issues]
    Product B: [features, common issues]

customer:
  label: customer
  description: "Current customer's information and interaction history. Update as you learn more about them."
  value: |
    Name: [extracted from conversation]
    Issue: [current problem]
    History: [relevant past interactions]

Coding Assistant

persona:
  label: persona
  description: "Your approach to coding assistance and communication with developers."
  value: |
    I help write clean, maintainable code. I explain my reasoning,
    suggest best practices, and ask clarifying questions.

project_context:
  label: project_context
  description: "Current project architecture and goals. Update as you learn about the codebase."
  value: |
    Project: [name]
    Stack: [technologies]
    Architecture: [patterns]
    Current Focus: [active work]

coding_standards:
  label: coding_standards
  description: "Team coding standards and review checklist. Reference before making suggestions."
  read_only: true
  value: |
    Style Guide: [details]
    Testing Requirements: [coverage, patterns]
    Review Checklist: [items]

current_task:
  label: current_task
  description: "Active coding task and implementation progress. Update as work advances."
  value: |
    Task: [description]
    Approach: [planned solution]
    Progress: [completed steps]
    Blockers: [current issues]

Personal Assistant

persona:
  label: persona
  description: "Your role as personal assistant and communication preferences."
  value: |
    I help manage your schedule, remind you of tasks, and provide
    proactive assistance. I'm concise unless you ask for detail.