Question about ollama+vision model support

I am trying to use Letta with a local Ollama backend and a vision-capable model
(e.g. qwen2.5-vl, llama3.2-vision).

However, Letta’s /v1/models endpoint only lists text-only Ollama models,
and attempting to create an agent with an Ollama vision model fails
Is it currently impossible to use Ollama + vision/multimodal models with Letta?

Also, I was previously in the Letta Discord server and asked similar questions,
but the server suddenly disappeared from my server list and I cannot rejoin via invite. If I was blocked, could you please let me know the reason?

Thanks!

Unclear why you were blocked – can you send me your username?

Regarding vision models, I was under the assumption that they worked. @ezra do you have any information here?

Here’s what I know about Ollama + vision:

Ollama has vision-capable models:

  • llama3.2-vision
  • llava
  • bakllava
  • moondream

Letta multimodal support: Per docs.letta.com/multimodal, image inputs work with compatible models. The docs explicitly list OpenAI (GPT-4o, etc.), Anthropic (Claude), and Google (Gemini) as supported.

The uncertainty: I don’t see Ollama vision models explicitly listed in the multimodal docs. This could mean:

  1. It works but isn’t documented
  2. There are integration gaps with how Ollama handles image payloads
  3. It depends on the specific model and how it expects image data

What would help diagnose:

  • Which Ollama vision model are you trying to use?
  • What error or behavior are you seeing?
  • Are images being sent via URL or base64?

If someone has tested Ollama + vision successfully (or hit specific errors), that would help clarify the current state. I don’t want to claim it works without verification.

Thank you for the response!

My username is “sydsim” (display name is “시드”)

About vision models, I asked the same question to ezra bot in letta discord server (it was the only chat I made except saying Hello)

I have checked that qwen2.5 model is working in my local letta server with ollama, but when I try to use vision model (qwen2.5vl, llama3.2-vision), they do not appear in model list in letta dashboard and /v1/models endpoint, even though I can find them in ollama and it works fine when i try them independently in ollama.

I also tried to create agent with Python sdk by explicitly providing the model name, it returned error like
NOT_FOUND: Handle ollama/qwen2.5vl:latest not found, must be one of [‘ollama/qwen2.5:latest’]

You’ve been unbanned, apologies!

TL;DR: This is an upstream Ollama limitation, not a Letta bug. Ollama doesn’t
currently support tool calling for vision models like llama3.2-vision and
qwen2.5-vl.

Why vision models don’t appear:

  • Letta filters Ollama models to only show those with “tools” in their
    capabilities (via Ollama’s /api/show endpoint). This is intentional - Letta
    agents require tool calling to function (for memory operations, sending
    messages, etc.).

  • When you run ollama show llama3.2-vision, you’ll see the capabilities array
    likely only contains [“completion”, “vision”] but not “tools”.

The upstream issue:

Interestingly, Meta’s documentation says llama3.2-vision should support tool
calling, but Ollama hasn’t implemented it yet for vision models. Once Ollama
adds tool support for these models, they should automatically appear in Letta.

Workaround:

Unfortunately there isn’t a good workaround right now - without tool calling,
the model can’t perform core agent functions like updating memory. You could
use a non-vision Ollama model for the agent and handle image processing
separately, but that’s not ideal.

I’d recommend following the Ollama issue above for updates on when vision +
tools will be supported.