Skip to content

Provider Support Matrix

What each LLM provider supports for multimodal content.

Overview

Different LLM providers handle media differently. The MCP Mesh SDK automatically converts media to each provider's native format, but capabilities vary.

Support Matrix

Content Type Claude OpenAI Gemini
Images (PNG, JPEG, GIF, WebP) Native image blocks image_url (base64) image_url (base64)
PDF Native document blocks Text extraction fallback Text extraction fallback
Text files (plain, CSV, MD, HTML, JSON) Text content blocks Text content blocks Text content blocks
Images in tool results Inline in tool message Separate user message Separate user message

Image Handling

All three providers support images, but with different mechanics:

Claude (Anthropic)

  • Images supported in both user messages and tool result messages
  • Native image content blocks with base64 encoding
  • Supports PNG, JPEG, GIF, WebP
  • Best multimodal experience -- images appear inline with tool results

OpenAI

  • Images supported in user messages only
  • Uses image_url content blocks with base64 data URIs
  • When a tool returns an image, the SDK sends it as a follow-up user message
  • Supports PNG, JPEG, GIF, WebP

Gemini

  • Similar to OpenAI -- images in user messages only
  • Uses image_url format compatible with OpenAI
  • Tool result images sent as follow-up user messages

Claude is Recommended for Media-Heavy Workloads

Claude provides the best multimodal experience because it supports images directly in tool result messages. This means the LLM sees the image in context with the tool output, rather than as a separate message.

PDF Handling

Provider Support
Claude Native document blocks -- full PDF understanding
OpenAI Text extraction fallback (first 50,000 characters)
Gemini Text extraction fallback (first 50,000 characters)

Text File Handling

All providers receive text files as plain text content blocks. Files are decoded as UTF-8 (with Latin-1 fallback) and truncated to 50,000 characters.

Supported text MIME types:

  • text/plain, text/csv, text/markdown, text/html, text/xml
  • application/json, application/xml, application/csv

Provider Selection for Multimodal

When building multimodal agents, select providers based on your media needs:

# Prefer Claude for image-heavy workloads
@mesh.llm(
    provider={"capability": "llm", "tags": ["+claude"]},
    filter=[{"capability": "chart_gen"}],
)
mesh.llm({
  provider: { capability: "llm", tags: ["+claude"] },
  filter: [{ capability: "chart_gen" }],
})
@MeshLlm(
    providerSelector = @Selector(capability = "llm", tags = {"+claude"}),
    filter = @Selector(capability = "chart_gen")
)

See Also