Custom Tool: Image Analysis via Self-Messaging

Image Analysis via Self-Messaging Tool

When tools fetch images, Letta agents can’t directly “see” them in tool returns. This custom tool pattern uses client injection to send image URLs back to the agent as user messages, enabling vision model processing.

Use Case

Agent uses a tool that retrieves an image URL → agent needs to visually analyze the image → tool sends image back as a user message attachment.

The Tool

def analyze_image_url(image_url: str, prompt: str = "Please analyze this image:") -> str:
    """
    Send an image URL back to this agent for visual analysis.
    Uses client injection to message self with image attachment.
    
    Args:
        image_url: Public URL of the image to analyze
        prompt: Text prompt to accompany the image
    
    Returns:
        Confirmation that image was sent for analysis
    """
    import os
    
    agent_id = os.getenv("LETTA_AGENT_ID")
    
    if not agent_id:
        return "Error: LETTA_AGENT_ID not configured in tool variables"
    
    # client is injected automatically on Letta Cloud
    client.agents.messages.create(
        agent_id=agent_id,
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": image_url
                    }
                }
            ]
        }]
    )
    
    return f"Image sent for analysis. You will see it in your next message."

Base64 Variant

For images not publicly accessible:

def analyze_image_base64(
    base64_data: str, 
    media_type: str = "image/png",
    prompt: str = "Please analyze this image:"
) -> str:
    """
    Send a base64-encoded image back to this agent for visual analysis.
    
    Args:
        base64_data: Base64-encoded image data
        media_type: MIME type (image/png, image/jpeg, image/webp, image/gif)
        prompt: Text prompt to accompany the image
    """
    import os
    
    agent_id = os.getenv("LETTA_AGENT_ID")
    
    if not agent_id:
        return "Error: LETTA_AGENT_ID not configured"
    
    client.agents.messages.create(
        agent_id=agent_id,
        messages=[{
            "role": "user", 
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,
                        "data": base64_data
                    }
                }
            ]
        }]
    )
    
    return "Image sent for analysis."

Setup

  1. Add tool to agent via ADE Tool Manager or SDK
  2. Add tool variable: LETTA_AGENT_ID = your agent’s ID
  3. Use vision-capable model: GPT-4o, Claude 3.5+, or Gemini

Workflow Example

Agent has a tool that fetches screenshots:

User: "Take a screenshot of the dashboard and tell me what you see"
Agent: [calls screenshot_tool → returns URL]
Agent: [calls analyze_image_url with that URL]
Agent: [receives image in next turn, analyzes it]
Agent: "I can see the dashboard shows..."

Requirements

  • Letta Cloud (client injection)
  • Vision-capable model
  • LETTA_AGENT_ID tool variable configured

Notes

  • This creates a new message in the agent’s history
  • The agent will process the image on its next turn
  • Works with any tool that produces image URLs or base64 data

Credit: Originated from Discord discussion with @jacbib7414