Provider Support Matrix¶
What each LLM provider supports for multimodal content.
Overview¶
Different LLM providers handle media differently. The MCP Mesh SDK automatically converts media to each provider's native format, but capabilities vary.
Support Matrix¶
| Content Type | Claude | OpenAI | Gemini |
|---|---|---|---|
| Images (PNG, JPEG, GIF, WebP) | Native image blocks | image_url (base64) | image_url (base64) |
| Native document blocks | Text extraction fallback | Text extraction fallback | |
| Text files (plain, CSV, MD, HTML, JSON) | Text content blocks | Text content blocks | Text content blocks |
| Images in tool results | Inline in tool message | Separate user message | Separate user message |
Image Handling¶
All three providers support images, but with different mechanics:
Claude (Anthropic)¶
- Images supported in both user messages and tool result messages
- Native
imagecontent blocks with base64 encoding - Supports PNG, JPEG, GIF, WebP
- Best multimodal experience -- images appear inline with tool results
OpenAI¶
- Images supported in user messages only
- Uses
image_urlcontent blocks with base64 data URIs - When a tool returns an image, the SDK sends it as a follow-up user message
- Supports PNG, JPEG, GIF, WebP
Gemini¶
- Similar to OpenAI -- images in user messages only
- Uses
image_urlformat compatible with OpenAI - Tool result images sent as follow-up user messages
Claude is Recommended for Media-Heavy Workloads
Claude provides the best multimodal experience because it supports images directly in tool result messages. This means the LLM sees the image in context with the tool output, rather than as a separate message.
PDF Handling¶
| Provider | Support |
|---|---|
| Claude | Native document blocks -- full PDF understanding |
| OpenAI | Text extraction fallback (first 50,000 characters) |
| Gemini | Text extraction fallback (first 50,000 characters) |
Text File Handling¶
All providers receive text files as plain text content blocks. Files are decoded as UTF-8 (with Latin-1 fallback) and truncated to 50,000 characters.
Supported text MIME types:
text/plain,text/csv,text/markdown,text/html,text/xmlapplication/json,application/xml,application/csv
Provider Selection for Multimodal¶
When building multimodal agents, select providers based on your media needs:
See Also¶
- LLM Media Input -- Passing media to LLMs
- Returning Media -- How tools produce media
- LLM Integration (Python) -- Full LLM documentation