Skip to content

LLM Kwargs Reference

Per-call generation parameters forwarded from @mesh.llm consumers to the underlying vendor SDK.

Overview

model_params is the per-call kwarg surface that flows from a @mesh.llm consumer, through the mesh proxy, to the resolved provider, and finally into the vendor's native SDK (Anthropic, OpenAI, Gemini). Mesh does not invent its own parameter names — common knobs (max_tokens, temperature, ...) work across all three vendors; vendor-specific knobs (thinking_config, reasoning_effort, top_k, ...) unlock per-vendor features.

The exact passthrough surface for each Python adapter lives in _<VENDOR>_PASSTHROUGH_KWARGS (see Source pointers below). Anything outside that set is logged as a once-per-key WARN by the adapter so a typo or a newer litellm-only knob surfaces immediately instead of being silently dropped.

Common kwargs (cross-vendor)

These work across Anthropic, OpenAI, and Gemini:

Kwarg Type Purpose
max_tokens int Maximum tokens to generate in the response.
temperature float (0.0 - 2.0) Sampling temperature. Lower = more deterministic.
top_p float (0.0 - 1.0) Nucleus sampling cutoff. Alternative to temperature.
stop list[str] Stop sequences that halt generation when produced. (Anthropic: stop_sequences.)
seed int Best-effort determinism seed. Honored by OpenAI and Gemini; ignored by Anthropic.

Mesh forwards these unchanged for OpenAI and Gemini. For Anthropic, mesh translates stopstop_sequences to match the native SDK.

Vendor-specific kwargs

Anthropic-only

Kwarg Purpose
top_k Top-k sampling cutoff. (Also supported by Gemini.)
metadata Caller-supplied metadata dict (e.g. {"user_id": "..."}) attached to the Anthropic request for billing/audit grouping.
output_config Native structured-output primitive on Claude Sonnet 4.5+ / Opus 4.1+. Wire shape: {"format": {"type": "json_schema", "schema": {...}}}. Older models fall through to mesh's synthetic-tool path.
extra_headers / extra_query / extra_body SDK-level escape hatches forwarded verbatim to the Anthropic client.

OpenAI-only

Kwarg Purpose
n Number of completions (note: mesh's response/stream adapters only consume the first candidate; use with care).
presence_penalty Penalty for repeated topics. (Also Gemini.)
frequency_penalty Penalty for repeated tokens. (Also Gemini.)
logit_bias Per-token bias dict.
logprobs / top_logprobs Return log-probabilities for sampled / top-k tokens.
parallel_tool_calls Allow the model to emit multiple tool calls in a single turn.
user End-user identifier for OpenAI abuse-monitoring.
reasoning_effort o1 / o3 reasoning-model effort knob ("low", "medium", "high").
max_completion_tokens Newer name for max_tokens on reasoning models — both accepted.
stream_options OpenAI streaming options dict (mesh sets include_usage itself but a caller-provided override is merged).

Gemini-only

Kwarg Purpose
top_k Top-k sampling cutoff. (Also Anthropic.)
presence_penalty / frequency_penalty Repetition penalties. (Also OpenAI.)
thinking_config Gemini 2.5+ thinking-budget control. Accepts a dict (e.g. {"thinking_budget": 0} to disable thinking) or a pre-built google.genai.types.ThinkingConfig instance.
response_mime_type Response MIME type — set to "application/json" together with response_schema for JSON output.
response_schema JSON schema for structured output. Mesh's HINT-mode workaround strips this when tools are present (Gemini API loop bug).
extra_headers / extra_body Translated into per-call google.genai.types.HttpOptions overrides.

Per-language usage

Python

model_params={...} is a parameter on @mesh.llm. Values are merged into the MeshLlmRequest.model_params dict at runtime and forwarded to the resolved provider.

import mesh

@mesh.llm(
    provider={"capability": "llm", "tags": ["+gemini"]},
    model_params={
        "max_tokens": 4096,
        "temperature": 0.3,
        "thinking_config": {"thinking_budget": 0},  # disable thinking for fast Gemini 2.5
    },
)
async def my_tool(prompt: str, llm: mesh.MeshLlmAgent = None) -> str:
    return await llm(prompt)

See MeshLlmRequest for the underlying request shape.

TypeScript

The TS SDK exposes a typed options surface on MeshLlmAgentConfig. Options are mapped to model_params keys on the wire (e.g. maxOutputTokensmax_tokens, topPtop_p).

import { MeshLlmAgent } from "@mcpmesh/sdk";

const agent = new MeshLlmAgent({
  functionId: "my_tool",
  provider: { capability: "llm", tags: ["+claude"] },
  model: "anthropic/claude-sonnet-4-5",
  maxIterations: 5,
  maxOutputTokens: 4096,
  temperature: 0.3,
  topP: 0.95,
  stop: ["\n\n---"],
  parallelToolCalls: true,
});

const result = await agent.run("Help me draft a release note", {
  tools: resolvedToolProxies,
  meshProvider: { endpoint: "http://provider:9000", functionName: "process_chat" },
});

For vendor-specific kwargs the typed surface doesn't expose (e.g. Gemini thinking_config, Anthropic output_config, OpenAI reasoning_effort), use the modelParams escape hatch on LlmCallOptions. The dict is merged into the wire model_params before typed fields, so typed options (maxOutputTokens, temperature, ...) win on collision and remain authoritative:

const reply = await llm.call(prompt, {
  maxOutputTokens: 4096,
  temperature: 0.3,
  modelParams: {
    thinking_config: { thinking_budget: 0 }, // escape hatch for vendor-specific kwargs
  },
});

Source: src/runtime/typescript/src/llm-agent.ts.

Java

The Java SDK uses the @MeshLlm annotation for tool-call defaults and a fluent builder on MeshLlmAgent for per-call overrides. Builder values are translated into model_params keys on the wire (maxTokensmax_tokens, topPtop_p).

import io.mcpmesh.types.MeshLlmAgent;
import io.mcpmesh.types.annotations.MeshLlm;
import io.mcpmesh.types.annotations.MeshTool;

@MeshLlm(
    providerSelector = "capability=llm,tags=+claude",
    maxTokens = 4096,
    temperature = 0.3
)
@MeshTool(capability = "summarizer")
public String summarize(@Param("text") String text, MeshLlmAgent llm) {
    return llm.request()
        .system("Summarize the following text in 2 sentences.")
        .user(text)
        .maxTokens(1024)        // override the annotation default for this call
        .temperature(0.5)
        .topP(0.95)
        .stop("END")
        .generate();
}

For vendor-specific kwargs the typed builder doesn't expose (e.g. Gemini thinking_config, Anthropic output_config, OpenAI reasoning_effort), use the .modelParams(...) escape hatch on the builder. The map is merged into the wire model_params before typed setters, so typed setters (maxTokens, temperature, ...) win on collision and remain authoritative:

String response = llm.request()
    .user(prompt)
    .maxTokens(4096)
    .temperature(0.3)
    .modelParams(Map.of(
        "thinking_config", Map.of("thinking_budget", 0)
    ))
    .generate();

The same builder surface — including .modelParams(...) — is available on the streaming path via .streamGenerate(). The merge semantics (escape hatch first, typed setters win, annotation defaults only when unset) are shared with the buffered .generate() path:

Flow.Publisher<String> chunks = llm.request()
    .system("You are helpful")
    .user(prompt)
    .maxTokens(4096)
    .temperature(0.7)
    .modelParams(Map.of(
        "thinking_config", Map.of("thinking_budget", 0)
    ))
    .streamGenerate();

Streaming requires the consumer to opt in via the ai.mcpmesh.stream tag on @MeshLlm(providerSelector = ...) — see Java LLM Integration.

Source: MeshLlmAgentProxy.java.

Reference matrix

Kwarg Anthropic OpenAI Gemini Notes
max_tokens yes yes yes Anthropic requires it; OpenAI/Gemini optional.
temperature yes yes yes
top_p yes yes yes
top_k yes no yes OpenAI has no equivalent.
stop yes (stop_sequences) yes yes Anthropic SDK renames; mesh translates.
seed no yes yes Anthropic ignores.
presence_penalty no yes yes
frequency_penalty no yes yes
logit_bias no yes no
logprobs / top_logprobs no yes no
n no yes no Mesh assumes single-completion; multi-candidate output is dropped.
parallel_tool_calls no yes no Mesh's loop honors it for sequencing on Anthropic too.
user no yes no
reasoning_effort no yes (o1/o3) no
metadata yes no no Anthropic billing/audit grouping.
output_config yes (Sonnet 4.5+/Opus 4.1+) no no Native structured output.
thinking_config no no yes (2.5+) Budget control for Gemini thinking models.
response_mime_type no no yes Pair with response_schema.
response_schema no no yes JSON schema for structured output.
extra_headers yes yes yes Vendor SDK escape hatch.
extra_body yes yes yes
extra_query yes yes no Gemini's HttpOptions has no per-call query override.
timeout / request_timeout yes yes yes Per-call timeout override in seconds.

Source pointers

For the precise passthrough surface per adapter, see the _<VENDOR>_PASSTHROUGH_KWARGS frozenset at the top of each native client:

  • Anthropic: src/runtime/python/_mcp_mesh/engine/native_clients/anthropic_native.py
  • OpenAI: src/runtime/python/_mcp_mesh/engine/native_clients/openai_native.py
  • Gemini: src/runtime/python/_mcp_mesh/engine/native_clients/gemini_native.py

Any kwarg outside the passthrough + handled sets emits a once-per-key WARN — useful diagnostic surface for callers debugging cross-vendor passthrough.

See also