LLM Media Input¶

Pass images, PDFs, and files directly to LLM agents.

Overview¶

The media= parameter lets you attach media when calling an LLM agent. The SDK resolves URIs from MediaStore and converts them to provider-native formats automatically.

Basic Usage¶

PythonTypeScriptJava

@mesh.llm(provider={"capability": "llm"})
@mesh.tool(capability="analyzer")
async def analyze(question: str, llm: mesh.MeshLlmAgent = None) -> str:
    # Single image URI
    return await llm("Describe this image", media=["file:///tmp/photo.png"])

    # Raw bytes
    return await llm("What is this?", media=[(png_bytes, "image/png")])

    # Multiple items
    return await llm("Compare these", media=[
        "file:///tmp/a.png",
        "s3://bucket/b.jpg",
    ])

execute: async ({ question }, { llm }) => {
  // Single URI
  return await llm("Describe this image", {
    media: ["file:///tmp/photo.png"],
  });

  // Buffer
  return await llm("What is this?", {
    media: [{ data: pngBuffer, mimeType: "image/png" }],
  });

  // Multiple items
  return await llm("Compare these", {
    media: ["file:///tmp/a.png", "s3://bucket/b.jpg"],
  });
}

return llm.request()
    .user("Describe this image")
    .media(imageUri)
    .generate();

Media Item Types¶

PythonTypeScript

Each item in the media list can be:

Type	Format	Example
URI string	`str`	`"file:///tmp/photo.png"`
Bytes tuple	`tuple[bytes, str]`	`(png_bytes, "image/png")`

Each item in the media array can be:

Type	Format	Example
URI string	`string`	`"file:///tmp/photo.png"`
Buffer object	`{ data: Buffer, mimeType: string }`	`{ data: pngBuffer, mimeType: "image/png" }`

Automatic Resource Link Resolution¶

You don't need to use media= explicitly when an LLM calls tools that return resource_link. The resolution is automatic:

1. LLM calls tool -> tool returns resource_link
2. SDK detects resource_link in tool result
3. SDK fetches media bytes from MediaStore
4. SDK converts to provider-native format
5. LLM sees the actual image/document content

This means a simple agentic loop works for multimodal: