Image Analysis via Self-Messaging Tool
When tools fetch images, Letta agents can’t directly “see” them in tool returns. This custom tool pattern uses client injection to send image URLs back to the agent as user messages, enabling vision model processing.
Use Case
Agent uses a tool that retrieves an image URL → agent needs to visually analyze the image → tool sends image back as a user message attachment.
The Tool
def analyze_image_url(image_url: str, prompt: str = "Please analyze this image:") -> str:
"""
Send an image URL back to this agent for visual analysis.
Uses client injection to message self with image attachment.
Args:
image_url: Public URL of the image to analyze
prompt: Text prompt to accompany the image
Returns:
Confirmation that image was sent for analysis
"""
import os
agent_id = os.getenv("LETTA_AGENT_ID")
if not agent_id:
return "Error: LETTA_AGENT_ID not configured in tool variables"
# client is injected automatically on Letta Cloud
client.agents.messages.create(
agent_id=agent_id,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image",
"source": {
"type": "url",
"url": image_url
}
}
]
}]
)
return f"Image sent for analysis. You will see it in your next message."
Base64 Variant
For images not publicly accessible:
def analyze_image_base64(
base64_data: str,
media_type: str = "image/png",
prompt: str = "Please analyze this image:"
) -> str:
"""
Send a base64-encoded image back to this agent for visual analysis.
Args:
base64_data: Base64-encoded image data
media_type: MIME type (image/png, image/jpeg, image/webp, image/gif)
prompt: Text prompt to accompany the image
"""
import os
agent_id = os.getenv("LETTA_AGENT_ID")
if not agent_id:
return "Error: LETTA_AGENT_ID not configured"
client.agents.messages.create(
agent_id=agent_id,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": base64_data
}
}
]
}]
)
return "Image sent for analysis."
Setup
- Add tool to agent via ADE Tool Manager or SDK
- Add tool variable:
LETTA_AGENT_ID= your agent’s ID - Use vision-capable model: GPT-4o, Claude 3.5+, or Gemini
Workflow Example
Agent has a tool that fetches screenshots:
User: "Take a screenshot of the dashboard and tell me what you see"
Agent: [calls screenshot_tool → returns URL]
Agent: [calls analyze_image_url with that URL]
Agent: [receives image in next turn, analyzes it]
Agent: "I can see the dashboard shows..."
Requirements
- Letta Cloud (client injection)
- Vision-capable model
LETTA_AGENT_IDtool variable configured
Notes
- This creates a new message in the agent’s history
- The agent will process the image on its next turn
- Works with any tool that produces image URLs or base64 data
Credit: Originated from Discord discussion with @jacbib7414