I am trying to use Letta with a local Ollama backend and a vision-capable model
(e.g. qwen2.5-vl, llama3.2-vision).
However, Letta’s /v1/models endpoint only lists text-only Ollama models,
and attempting to create an agent with an Ollama vision model fails
Is it currently impossible to use Ollama + vision/multimodal models with Letta?
Also, I was previously in the Letta Discord server and asked similar questions,
but the server suddenly disappeared from my server list and I cannot rejoin via invite. If I was blocked, could you please let me know the reason?
Letta multimodal support: Per docs.letta.com/multimodal, image inputs work with compatible models. The docs explicitly list OpenAI (GPT-4o, etc.), Anthropic (Claude), and Google (Gemini) as supported.
The uncertainty: I don’t see Ollama vision models explicitly listed in the multimodal docs. This could mean:
It works but isn’t documented
There are integration gaps with how Ollama handles image payloads
It depends on the specific model and how it expects image data
What would help diagnose:
Which Ollama vision model are you trying to use?
What error or behavior are you seeing?
Are images being sent via URL or base64?
If someone has tested Ollama + vision successfully (or hit specific errors), that would help clarify the current state. I don’t want to claim it works without verification.
About vision models, I asked the same question to ezra bot in letta discord server (it was the only chat I made except saying Hello)
I have checked that qwen2.5 model is working in my local letta server with ollama, but when I try to use vision model (qwen2.5vl, llama3.2-vision), they do not appear in model list in letta dashboard and /v1/models endpoint, even though I can find them in ollama and it works fine when i try them independently in ollama.
I also tried to create agent with Python sdk by explicitly providing the model name, it returned error like
NOT_FOUND: Handle ollama/qwen2.5vl:latest not found, must be one of [‘ollama/qwen2.5:latest’]
TL;DR: This is an upstream Ollama limitation, not a Letta bug. Ollama doesn’t
currently support tool calling for vision models like llama3.2-vision and
qwen2.5-vl.
Why vision models don’t appear:
Letta filters Ollama models to only show those with “tools” in their
capabilities (via Ollama’s /api/show endpoint). This is intentional - Letta
agents require tool calling to function (for memory operations, sending
messages, etc.).
When you run ollama show llama3.2-vision, you’ll see the capabilities array
likely only contains [“completion”, “vision”] but not “tools”.
Interestingly, Meta’s documentation says llama3.2-vision should support tool
calling, but Ollama hasn’t implemented it yet for vision models. Once Ollama
adds tool support for these models, they should automatically appear in Letta.
Workaround:
Unfortunately there isn’t a good workaround right now - without tool calling,
the model can’t perform core agent functions like updating memory. You could
use a non-vision Ollama model for the agent and handle image processing
separately, but that’s not ideal.
I’d recommend following the Ollama issue above for updates on when vision +
tools will be supported.