The TripPlanner Tutorial

Build a production-grade multi-agent system with MCP Mesh

The TripPlanner Tutorial

MCP Mesh has a lot of surface area — decorators, dependency injection, capability-based discovery, LLM provider abstraction, tag routing, structured outputs, and thirty-odd more concepts beyond those. Reading about each one in isolation will only take you so far. At some point you need to see how they compose inside a real application, the kind of multi-user, cloud-deployable system that an enterprise-grade agent framework was built to support.

That’s what this tutorial is. Over ten chapters you’ll build TripPlanner, a multi-agent trip-planning application that is decidedly not a chatbot demo or a “hello, world.” It has tool agents for domain logic, LLM-driven planning, a committee of specialists that refine results, a chat API for end users, and a full deployment to Kubernetes with observability baked in. You’ll start on Day 1 with a single agent running locally, and by Day 10 every one of those pieces will be live — built by you, understood by you.

What you’ll have built by Day 10

By the end of the tutorial, TripPlanner consists of:

Five tool agents — flight search, hotel search, weather forecast, points of interest, and user preferences. Each runs as a standalone mesh agent and exposes one or more tools.
An LLM planner — an @mesh.llm agent driven by Jinja prompt templates. It uses the tool agents as dependencies and orchestrates an end-to-end trip plan.
Multiple LLM providers — Claude, GPT, and Gemini running simultaneously, with preference-based routing and automatic failover if one goes down.
A committee of three specialists — flight specialist, hotel specialist, and itinerary specialist — each an @mesh.llm agent, coordinated to refine the plan.
A FastAPI chat gateway — a stateless HTTP endpoint that accepts user messages and returns planner responses.
A cross-language gateway swap — a demonstration of replacing the FastAPI gateway with a Spring Boot gateway mid-tutorial. Same agents, same mesh, different language, everything works.
Redis-backed chat history — persistent, resumable conversations indexed by user and session.
Kubernetes deployment via Helm — the same agents running on a real cluster, with the registry as a service and agents as deployments.
An observability stack — Tempo for traces, Grafana dashboards, metrics on tool call latency, queue depth, and error rates.

The Day 10 architecture

graph TB
    User[User] --> Gateway[FastAPI Chat Gateway]
    Gateway --> Planner[LLM Planner]
    Gateway --> History[(Redis Chat History)]

    Planner --> Committee
    subgraph Committee[Committee of Specialists]
        FlightSpec[Flight Specialist]
        HotelSpec[Hotel Specialist]
        ItinSpec[Itinerary Specialist]
    end

    FlightSpec --> Flights[flight-agent]
    HotelSpec --> Hotels[hotel-agent]
    ItinSpec --> Weather[weather-agent]
    ItinSpec --> POI[poi-agent]
    Planner --> Prefs[user-prefs-agent]

    subgraph Observability
        Tempo[Tempo]
        Grafana[Grafana]
    end

    Planner -.traces.-> Tempo
    Committee -.traces.-> Tempo
    Tempo --> Grafana

Everything in that diagram runs on Kubernetes in the final chapter. The agents themselves are plain Python functions — no k8s-specific code, no sidecars, no framework-specific wiring.

The arc

The tutorial is ten chapters long, split into two parts.

Part 1 — Build and run (Days 1-5) starts from nothing and ends with a working TripPlanner running locally. You scaffold your first agent, learn how dependency injection works between tools, introduce tag-based routing, plug in an LLM with prompt templates, put a FastAPI gateway in front of it all, and then swap that gateway for Spring Boot to see cross-language interop in action.

Part 2 — Grow and scale (Days 6-10) takes the working system and grows it into something production-shaped. You add a committee of specialists to refine plans, wire Redis into the chat for persistent history, instrument everything with traces and metrics, deploy to Kubernetes via Helm, and finish with production hardening.

!!! info “All ten chapters are available” Days 1-10 are complete. Work through them at your own pace – each chapter builds on the previous one, from a single tool agent to a 13-agent system running on Kubernetes.

!!! note “Language coverage” This tutorial uses Python throughout. The patterns and concepts apply equally to TypeScript and Java — see the TypeScript SDK and Java SDK documentation for language-specific syntax.

Prerequisites

Before starting Day 1, you’ll need Python 3.11+, meshctl on your PATH, and a few minutes to set up a virtual environment. See the Prerequisites page for platform-specific install instructions.

Start Day 1

When you’re ready, head to Day 1 — Scaffold & first tool.

Things worth noticing along the way

As you work through the tutorial, keep an eye out for a few things we’re particularly proud of:

One codebase, every environment. The agent you write on Day 1 runs locally, in Docker, and on Kubernetes without any configuration changes.
mesh runs in-process. There are no sidecars or proxy containers to manage — your agent code is all you need to deploy.
Distributed calls feel like local function calls. Declare your dependencies, then call them — mesh injects the real implementations at runtime, whether they live in the same process or across the network. No REST clients, no MCP wiring, no response parsing. Your code reads like a plain Python script, which is why a complex multi-agent application can go from zero to running in half a day.
Day 1 code is Day 9 code. The function you write in the first tutorial is the same function that runs on Kubernetes later. Same file, same decorators, same types.
Switching LLM providers is zero code changes. Your agent declares a dependency on the llm capability — no vendor SDK, no provider-specific code. Swap Claude for GPT by bringing up a different provider agent; mesh abstracts away the API differences and your consumer auto-switches. With preference tags like +claude, you also get automatic failover — if Claude goes down, traffic routes to the next available provider with no downtime. Day 4 shows this in practice.

Prerequisites

What you need before starting Day 1 of the TripPlanner tutorial.

Supported platforms

macOS (Intel or Apple Silicon)
Linux (x86_64 or ARM64)
Windows via WSL2

meshctl

meshctl is the command-line tool you’ll use to start, inspect, and call agents.

npm install -g @mcpmesh/cli

Verify

meshctl --version

Language runtime

Python 3.11 or later

# Check your version
python3 --version

# Install if needed
brew install python@3.11          # macOS (Homebrew)
sudo apt install python3.11       # Ubuntu/Debian

Virtual environment

Create a .venv in your project root and install mcp-mesh into it. meshctl auto-detects .venv when starting an agent — you only need to activate it when running pip.

python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install mcp-mesh
deactivate

Verify

.venv/bin/python -c "import mesh; print('mesh OK')"

!!! note “Other languages” This tutorial uses Python. For TypeScript or Java setup, see the TypeScript prerequisites and Java prerequisites.

Ready to start

Once meshctl --version prints a version and .venv/bin/python -c "import mesh" succeeds, you’re ready for Day 1.

Day 1 — Scaffold and First Tool Agent

Today you’ll scaffold your first tool agent, run it locally, and call it from your terminal. By the end you’ll have used every core meshctl command. No LLMs yet — just the basics: build, start, inspect, call.

What we’re building today

graph LR
    Agent[flight-agent] -->|registers| Registry[Registry]
    You[You] -->|discovers agent| Registry
    You -->|meshctl call| Agent

A local registry and one agent. The agent registers with the registry so it can be discovered. When you run meshctl call, it looks up the agent’s endpoint via the registry and then calls the agent directly. (By default meshctl proxies the call through the registry for convenience — useful in Docker/K8s where you only port-forward the registry — but architecturally the registry is a discovery layer, not a routing layer.) The agent exposes a single tool, flight_search, that takes an origin, destination, and date and returns stub flight data. That’s the complete Day 1 mesh.

Step 1: Scaffold the agent

meshctl scaffold generates a ready-to-run agent from a built-in template. For a basic Python tool agent, the flags you need are --name, --agent-type tool, and --lang python (which is the default, so you can omit it).

$ meshctl scaffold --name flight-agent --agent-type tool --port 9101

Created agent 'flight-agent' in flight-agent/

Generated files:
  flight-agent/
  |-- .dockerignore
  |-- Dockerfile
  |-- README.md
  |-- __init__.py
  |-- __main__.py
  |-- helm-values.yaml
  |-- main.py
  |-- requirements.txt

Next steps:
  meshctl start flight-agent/main.py

For Docker/K8s deployment, see: meshctl man deployment

Everything mesh needs is in flight-agent/main.py. The scaffold also generates Docker and Helm files — you won’t need them today, but they’ll come in handy on Day 8 (Docker) and Day 9 (Kubernetes). The scaffold gives you a starting function named hello — you’re going to replace it with flight_search.

Step 2: Write the tool

A mesh tool is a plain Python function with two decorators: @app.tool() from FastMCP (which exposes it as an MCP tool) and @mesh.tool(...) from MCP Mesh (which registers it with the mesh and handles dependency injection). Here’s the flight_search function you’ll put in main.py:

> *See the source code in the day's example directory.*

Three parameters, a list of dicts back. The capability on @mesh.tool is how other agents will look this tool up once there are other agents — you’ll see that on Day 2. The tags are how the registry narrows matches when multiple agents advertise the same capability.

Here’s the complete main.py — imports, tool function, and agent class:

> *See the source code in the day's example directory.*

The @mesh.agent class at the bottom is what mesh uses to run the FastMCP server and register the agent with the registry. auto_run=True means you don’t need a main() — mesh starts the server when the module is imported by meshctl start.

!!! tip “meshctl DX: prerequisite detection” Before meshctl start actually runs anything, it checks that the language runtime and required packages are present. If something’s missing, it prints the exact commands you need to fix it and then exits — it won’t half-start a broken agent. Here’s what you’d see if Python’s .venv is missing:

```shell
$ meshctl start flight-agent/main.py
Validating prerequisites...

❌ Prerequisite check failed: Python environment

Python environment check failed: .venv not found in current directory

MCP Mesh requires a .venv directory in your current working directory.

Current directory: /home/you/trip-planner

To fix this issue:
  1. Navigate to your project directory (where your agents are)
  2. Create a virtual environment: python3.11 -m venv .venv
  3. Activate it: source .venv/bin/activate
  4. Install mcp-mesh: pip install mcp-mesh
  5. Run meshctl start from this directory

Run 'meshctl man prerequisite' for detailed setup instructions.
```

Same pattern for missing `mcp-mesh`, missing Node for TypeScript agents, or
missing Java/Maven for Java agents — `meshctl` tells you what's wrong and
what command to run next.

Step 3: Start the agent

With a .venv in place and mcp-mesh installed, start the agent in detached mode. If no registry is running, meshctl starts one automatically on port 8000.

$ meshctl start flight-agent/main.py -d
Validating prerequisites...
  Using virtual environment: /tmp/trip-planner-day1/.venv/bin/python
  All prerequisites validated successfully
   Python: 3.11.14 (/tmp/trip-planner-day1/.venv/bin/python)
   Virtual environment: .venv
Started 'flight-agent' in detach
Logs: ~/.mcp-mesh/logs/flight-agent.log
Use 'meshctl logs flight-agent' to view or 'meshctl stop flight-agent' to stop

meshctl auto-detected the .venv and started the agent in detached mode. The registry was started automatically — no separate command needed. Logs are stored at ~/.mcp-mesh/logs/flight-agent.log and viewable with meshctl logs flight-agent.

Step 4: Start the UI

meshctl ships a web dashboard for inspecting agents, tools, and traces. Start it alongside your agent:

$ meshctl start --ui -d
Started in detach
Use 'meshctl logs <agent>' to view logs or 'meshctl stop' to stop

The dashboard is available at http://localhost:3080. Open it in your browser and you’ll see flight-agent listed with its status and capabilities.

Mesh UI showing flight-agent on the Topology page

Step 5: Inspect the mesh

meshctl list shows you what’s running:

$ meshctl list
Registry: running (http://localhost:8000) - 1 healthy

NAME                    RUNTIME        TYPE    STATUS       DEPS     ENDPOINT                AGE      LAST SEEN
--------------------------------------------------------------------------------------------------------------------------
flight-agent-ba2b3bc8   Python         Agent   healthy      0/0      10.0.0.74:9101          53s      3s

The agent registers as flight-agent-ba2b3bc8 — mesh appends a short hash to ensure uniqueness when multiple instances of the same agent run. All meshctl commands accept the prefix flight-agent for convenience, so you never need to type the hash.

The DEPS column is 0/0 because flight-agent doesn’t depend on any other agent. When you add hotel and weather agents on Day 2, this column will show resolved-over-declared dependencies and turn green when all dependencies are satisfied.

meshctl list --tools shows every tool registered across all agents:

$ meshctl list --tools
TOOL                      AGENT                   CAPABILITY           TAGS
----------------------------------------------------------------------------------------
flight_search             flight-agent-ba2b3bc8   flight_search        flights,travel

1 tool(s) found

And meshctl status flight-agent gives you a detailed breakdown — capabilities, endpoint, version, uptime:

$ meshctl status flight-agent
Agent Details: flight-agent-ba2b3bc8
================================================================================
Name                : flight-agent-ba2b3bc8
Type                : Agent
Runtime             : Python
Status              : healthy
Endpoint            : http://10.0.0.74:9101
Version             : 1.0.0
Dependencies        : 0/0
Last Seen           : 2026-04-12 05:29:01 (3s ago)
Created             : 2026-04-12 01:28:06

Capabilities (1):
--------------------------------------------------------------------------------
CAPABILITY                MCP TOOL                       VERSION    TAGS
--------------------------------------------------------------------------------
flight_search             flight_search                  1.0.0      flights,travel

Step 6: Call the tool

meshctl call discovers the agent via the registry and sends an MCP JSON-RPC tools/call to it. You pass the tool name and a JSON object with the arguments:

$ meshctl call flight_search '{"origin":"SFO","destination":"NRT","date":"2026-06-01"}'

{
  "_meta": {
    "fastmcp": {
      "wrap_result": true
    }
  },
  "content": [
    {
      "type": "text",
      "text": "[{\"carrier\":\"MH\",\"flight\":\"MH007\",\"origin\":\"SFO\",\"destination\":\"NRT\",\"date\":\"2026-06-01\",\"depart\":\"09:15\",\"arrive\":\"14:40\",\"price_usd\":842},{\"carrier\":\"SQ\",\"flight\":\"SQ017\",\"origin\":\"SFO\",\"destination\":\"NRT\",\"date\":\"2026-06-01\",\"depart\":\"11:50\",\"arrive\":\"17:05\",\"price_usd\":901}]"
    }
  ],
  "structuredContent": {
    "result": [
      {
        "carrier": "MH",
        "flight": "MH007",
        "origin": "SFO",
        "destination": "NRT",
        "date": "2026-06-01",
        "depart": "09:15",
        "arrive": "14:40",
        "price_usd": 842
      },
      {
        "carrier": "SQ",
        "flight": "SQ017",
        "origin": "SFO",
        "destination": "NRT",
        "date": "2026-06-01",
        "depart": "11:50",
        "arrive": "17:05",
        "price_usd": 901
      }
    ]
  },
  "isError": false
}

The response is a standard MCP tool result envelope. The flight data you care about is under structuredContent.result — two flights matching the stub data from your flight_search function. The content field contains the same data as a JSON string (the MCP text format), and _meta is FastMCP internal metadata. When other agents call this tool via dependency injection, mesh parses structuredContent automatically — they receive the Python list directly.

meshctl call discovers the agent’s endpoint via the registry and calls it. By default it proxies through the registry for convenience — this is especially useful in Kubernetes where you only need to port-forward the registry. You can call the agent directly with --use-proxy=false for debugging.

Stop and clean up

One command stops the registry, the agent, and any other background processes meshctl is tracking:

$ meshctl stop
Stopping 1 agent(s) in parallel...
Stopping agent 'flight-agent' (PID: 14560)...
Agent 'flight-agent' stopped
Stopping UI server (PID: 15245)...
UI server stopped
Stopping registry (PID: 14555)...
Registry stopped

Stopped 3 process(es)

Troubleshooting

Agent name has a hash suffix. Your agent registers as flight-agent-XXXXXXXX (name plus a random hash). This ensures uniqueness when you run multiple instances. All meshctl commands accept just the prefix (flight-agent) — you never need to type the hash.

Warning about McpMeshTool parameters in logs. If you check meshctl logs flight-agent, you may see a warning: Function '__main__.flight_search' has 3 parameters but none are typed as McpMeshTool. Skipping injection of 0 dependencies. This is harmless — it means your tool has no mesh dependencies to inject, which is expected on Day 1. The warning disappears once you add dependencies on Day 2.

meshctl stop reports a failed UI process. If meshctl stop reports Failed to stop UI server, it usually means a previous UI process is still running. Run ps aux | grep meshui to find it and kill <PID> to clean it up.

Port 8000 already in use. If meshctl start fails because port 8000 is taken, another service (or a previous registry) is using it. Stop the other service, or set a different port with MCP_MESH_REGISTRY_PORT=9000 meshctl start ....

Recap

You built, started, inspected, and called an agent using six meshctl commands and a dozen lines of Python. The flight_search function you wrote today is the same function that will run on Kubernetes on Day 9 — same file, same decorators, same types, no wrapper code or deployment-specific edits. That’s DDDI: the agent doesn’t know or care where it’s running, and you get dev-to-production with nothing in between.

Next up

Day 2 — More Tools and Dependency Injection adds four more tool agents and introduces dependency injection between them — the flight_search tool will start asking for user preferences from another agent, and you’ll see how mesh resolves and injects those dependencies at runtime.

Day 2 — More Tools and Dependency Injection

Yesterday you built one agent. Today you’ll build four more, connect them via dependency injection, and see mesh resolve dependencies at runtime. By the end you’ll have five agents working together — and you won’t have written a single line of networking code.

What we’re building today

graph LR
    FA[flight-agent] -->|depends on| UPA[user-prefs-agent]
    PA[poi-agent] -->|depends on| WA[weather-agent]
    HA[hotel-agent]
    UPA
    WA

    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff

Five agents. Two dependency arrows. flight-agent calls user-prefs-agent to personalize results. poi-agent calls weather-agent to recommend indoor or outdoor activities. The other three — hotel-agent, weather-agent, and user-prefs-agent — are standalone tools with no dependencies.

Step 1: Scaffold the new agents

You know meshctl scaffold from Day 1. Scaffold four new agents:

$ meshctl scaffold --name hotel-agent --agent-type tool --port 9102
$ meshctl scaffold --name weather-agent --agent-type tool --port 9103
$ meshctl scaffold --name poi-agent --agent-type tool --port 9104
$ meshctl scaffold --name user-prefs-agent --agent-type tool --port 9105

Each command creates the same set of files you saw on Day 1: main.py, Dockerfile, helm-values.yaml, and the rest. You’ll replace the generated main.py in each directory with the tool implementations below.

Step 2: Write the tools

Standalone tools: hotel, weather, user-prefs

These three agents have no dependencies. Each registers a single tool with the mesh.

hotel-agent — searches for hotels at a destination:

> *See the source code in the day's example directory.*

weather-agent — returns a weather forecast:

> *See the source code in the day's example directory.*

user-prefs-agent — returns user travel preferences:

> *See the source code in the day's example directory.*

All three follow the same pattern from Day 1: @app.tool() + @mesh.tool() with a capability name and tags. No dependencies, no injected parameters.

DI tools: flight-agent (updated) and poi-agent (new)

These two agents depend on other agents’ capabilities. This is where dependency injection comes in.

flight-agent — updated from Day 1 to depend on user_preferences:

> *See the source code in the day's example directory.*

Three things changed from Day 1:

dependencies=["user_preferences"] on @mesh.tool declares that this tool needs the user_preferences capability at runtime.
user_prefs: mesh.McpMeshTool = None is the injected parameter. At startup, mesh resolves the dependency by finding an agent that advertises user_preferences, creates a proxy, and injects it here.
await user_prefs(user_id="demo-user") calls the injected tool like a regular async function. No URL, no REST client, no serialization code — mesh handles all of that behind the proxy.

The function also changed from def to async def — dependency injection calls are async because they cross process boundaries.

poi-agent — depends on weather_forecast:

> *See the source code in the day's example directory.*

Same pattern: declare the dependency in @mesh.tool, accept an mesh.McpMeshTool parameter, and call it with await. The search_pois function fetches the weather forecast, checks the rain chance, and adjusts its recommendations — indoor activities if rain is likely, outdoor otherwise.

Here’s the complete flight-agent/main.py for reference:

> *See the source code in the day's example directory.*

Step 3: Start all agents

Start all five with one command:

$ meshctl start --debug -d -w flight-agent/main.py hotel-agent/main.py weather-agent/main.py poi-agent/main.py user-prefs-agent/main.py

Validating prerequisites...
  Using virtual environment: /tmp/trip-planner-day2/.venv/bin/python
  All prerequisites validated successfully
   Python: 3.11.14 (/tmp/trip-planner-day2/.venv/bin/python)
   Virtual environment: .venv
Starting 5 agents in detach: flight-agent, hotel-agent, weather-agent, poi-agent, user-prefs-agent
Logs: ~/.mcp-mesh/logs/<agent>.log
Use 'meshctl logs <agent>' to view or 'meshctl stop' to stop all

The -w flag means mesh is watching your agent files — edit any main.py, save it, and mesh restarts that agent automatically. Combined with -d (detach) and --debug (verbose logs), this gives you a tight development loop: edit, save, call, see results.

Here’s what each flag does:

--debug — verbose logging. Useful for seeing dependency resolution.
-d — detach mode. All five agents run in the background.
-w — watch mode. Monitors agent directories and auto-restarts on changes.

If no registry is running, meshctl starts one automatically, same as Day 1.

Step 4: Start the UI

$ meshctl start --ui -d

The dashboard is at http://localhost:3080. You’ll see all five agents listed.

Mesh UI Topology showing five agents with dependency edges

Step 5: Inspect the mesh

$ meshctl list
Registry: running (http://localhost:8000) - 5 healthy

NAME                        RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
flight-agent-835864a0       Python    Agent   healthy   1/1    10.0.0.74:63297    5s    5s
hotel-agent-eb0eb637        Python    Agent   healthy   0/0    10.0.0.74:63298    5s    5s
poi-agent-5923d848          Python    Agent   healthy   1/1    10.0.0.74:63295    5s    5s
user-prefs-agent-950b70c3   Python    Agent   healthy   0/0    10.0.0.74:63294    5s    5s
weather-agent-1760466a      Python    Agent   healthy   0/0    10.0.0.74:63296    5s    5s

Notice the DEPS column. flight-agent shows 1/1 — one dependency declared, one resolved. poi-agent also shows 1/1. The others show 0/0. When all dependencies are resolved, the agent is fully operational.

List the tools:

$ meshctl list --tools
TOOL              AGENT                       CAPABILITY         TAGS
flight_search     flight-agent-835864a0       flight_search      flights,travel
get_user_prefs    user-prefs-agent-950b70c3   user_preferences   preferences,travel
get_weather       weather-agent-1760466a      weather_forecast   weather,travel
hotel_search      hotel-agent-eb0eb637        hotel_search       hotels,travel
search_pois       poi-agent-5923d848          poi_search         poi,travel

5 tool(s) found

Five tools across five agents. Each tool’s capability name is how other agents find it via dependency injection.

Step 6: Call a tool with dependency injection

Call flight_search. This triggers a cross-agent call — flight-agent calls user-prefs-agent behind the scenes to fetch user preferences:

$ meshctl call flight_search '{"origin":"SFO","destination":"NRT","date":"2026-06-01"}'

The response includes personalized results. The stub preferences set a budget of $1000 and prefer SQ and MH airlines, so the $1150 AA flight is filtered out, and the preferred carriers sort first:

{
  "_meta": {
    "fastmcp": {
      "wrap_result": true
    }
  },
  "content": [
    {
      "type": "text",
      "text": "[{\"carrier\":\"MH\",\"flight\":\"MH007\",\"origin\":\"SFO\",\"destination\":\"NRT\",\"date\":\"2026-06-01\",\"depart\":\"09:15\",\"arrive\":\"14:40\",\"price_usd\":842},{\"carrier\":\"SQ\",\"flight\":\"SQ017\",\"origin\":\"SFO\",\"destination\":\"NRT\",\"date\":\"2026-06-01\",\"depart\":\"11:50\",\"arrive\":\"17:05\",\"price_usd\":901}]"
    }
  ],
  "structuredContent": {
    "result": [
      {
        "carrier": "MH",
        "flight": "MH007",
        "origin": "SFO",
        "destination": "NRT",
        "date": "2026-06-01",
        "depart": "09:15",
        "arrive": "14:40",
        "price_usd": 842
      },
      {
        "carrier": "SQ",
        "flight": "SQ017",
        "origin": "SFO",
        "destination": "NRT",
        "date": "2026-06-01",
        "depart": "11:50",
        "arrive": "17:05",
        "price_usd": 901
      }
    ]
  },
  "isError": false
}

Now call search_pois. This triggers poi-agent calling weather-agent:

$ meshctl call search_pois '{"location":"Tokyo"}'

{
  "content": [
    {
      "type": "text",
      "text": "{\"location\":\"Tokyo\",\"weather_summary\":\"Partly cloudy in Tokyo on today, 28C high, 30% chance of rain.\",\"recommendation\":\"Weather looks good — outdoor activities recommended.\",\"pois\":[{\"name\":\"Senso-ji Temple\",\"type\":\"outdoor\",\"category\":\"cultural\",\"location\":\"Tokyo\"},{\"name\":\"Ueno Park\",\"type\":\"outdoor\",\"category\":\"nature\",\"location\":\"Tokyo\"},{\"name\":\"Meiji Shrine\",\"type\":\"outdoor\",\"category\":\"cultural\",\"location\":\"Tokyo\"},{\"name\":\"TeamLab Borderless\",\"type\":\"indoor\",\"category\":\"art\",\"location\":\"Tokyo\"}]}"
    }
  ],
  "structuredContent": {
    "location": "Tokyo",
    "weather_summary": "Partly cloudy in Tokyo on today, 28C high, 30% chance of rain.",
    "recommendation": "Weather looks good — outdoor activities recommended.",
    "pois": [
      {"name": "Senso-ji Temple", "type": "outdoor", "category": "cultural", "location": "Tokyo"},
      {"name": "Ueno Park", "type": "outdoor", "category": "nature", "location": "Tokyo"},
      {"name": "Meiji Shrine", "type": "outdoor", "category": "cultural", "location": "Tokyo"},
      {"name": "TeamLab Borderless", "type": "indoor", "category": "art", "location": "Tokyo"}
    ]
  },
  "isError": false
}

The 30% rain chance is below the 50% threshold, so poi-agent recommends outdoor activities. Change the stub data in weather-agent to return 80% rain chance, save the file (watch mode restarts it automatically), and call again — you’ll get indoor recommendations instead.

!!! tip “meshctl DX — watch mode” Edit your flight_search function, save the file, and mesh auto-restarts the agent. No manual stop/start cycle. Combined with -d, you get a development loop that feels like editing a local script — change, save, call, see results.

!!! info “What is DDDI?” Your flight_search function calls user_prefs() like a local function. It has no idea that user_prefs lives in a different process, possibly on a different machine. mesh resolved the dependency by matching the user_preferences capability name, injected a proxy that handles the network call, and your code stayed clean. That’s Distributed Dynamic Dependency Injection — DDDI.

Stop and clean up

$ meshctl stop

On Day 3 you’ll restart with distributed tracing enabled — the agents need the --dte flag to publish trace events, so a fresh start is needed.

Troubleshooting

“Dependency not resolved” — agent shows 0/1 in DEPS column. This means the agent that provides the required capability hasn’t registered yet. mesh doesn’t crash — the dependent agent starts and waits. Once the provider agent registers, mesh resolves the dependency and the DEPS column updates to 1/1. If you start agents one at a time, you may see this briefly. Starting all agents together (as in Step 3) avoids it in practice.

DI call returns empty dict instead of preferences. Check that user_prefs is not None. The if user_prefs else {} guard in the function handles the case where the dependency wasn’t resolved. If it’s consistently None, check meshctl status flight-agent to verify the dependency is resolved.

Watch mode doesn’t pick up changes. Verify that the file you edited is in the same directory that meshctl start is watching. Watch mode monitors the directory of the main.py file you passed to meshctl start.

Agent ports change on every restart. When using -w (watch mode), meshctl starts agents with the HTTP port set to 0 — the OS assigns a random available port. This is intentional: when watch mode restarts an agent after a code change, the old process needs to release its port before the new one starts. Since mesh discovers agents by capability name through the registry (not by URL), the actual port number doesn’t matter. meshctl call and dependency injection both resolve endpoints via the registry, so everything works regardless of which port an agent lands on.

Recap

You built five agents, connected two of them via dependency injection, and called tools that trigger cross-agent calls. The total networking code you wrote: zero lines. The dependency injection, service discovery, and proxy creation all happened at runtime — declared in decorators, resolved by mesh.

Next up

Day 3 sets up the observability stack for distributed tracing, then adds an LLM provider agent and a planner — your first agent that can reason, not just return data.

Day 3 – Observability and LLM Integration

On Day 2 you built five tool agents with dependency injection. Today you’ll restart them with distributed tracing enabled, add an LLM provider, and build your first agent that can reason – a trip planner that generates itineraries from natural language.

What we’re building today

graph LR
    FA[flight-agent] -->|depends on| UPA[user-prefs-agent]
    PA[poi-agent] -->|depends on| WA[weather-agent]
    HA[hotel-agent]
    PL[planner-agent] -->|uses LLM| CP[claude-provider]

    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff
    style CP fill:#9b59b6,color:#fff
    style PL fill:#9b59b6,color:#fff

Seven agents. The five you already know (blue and green) plus two new ones in purple: claude-provider wraps the Claude API as a mesh capability, and planner-agent consumes that capability to generate trip itineraries. The planner connects to the provider through the same capability-based discovery that flight-agent uses to find user-prefs-agent – no hardcoded URLs, no model-specific code in the planner.

Today has five parts:

Set up distributed tracing – Redis, Tempo, Grafana via Docker Compose
Register an LLM provider – wrap Claude as a mesh capability
Build the planner agent – consume the LLM via prompt templates
Call the planner – generate a Kyoto itinerary
Walk the trace – see the full call tree across agents

Part 1: Set up distributed tracing

Mesh agents publish trace events to Redis. The registry consumes those events and exports them to Tempo. You view traces with meshctl trace or in Grafana. Before any of that works, you need the observability stack running.

Generate the compose file

$ meshctl scaffold --observability

This generates a docker-compose.observability.yml with Redis, Tempo, and Grafana, plus the supporting config files (Tempo config, Grafana provisioning).

Start the stack

$ docker compose -f docker-compose.observability.yml up -d

 Container trip-planner-redis   Started
 Container trip-planner-tempo   Started
 Container trip-planner-grafana Started

Verify everything is healthy:

$ docker compose -f docker-compose.observability.yml ps
NAME                   STATUS
trip-planner-redis     Up (healthy)
trip-planner-tempo     Up (healthy)
trip-planner-grafana   Up (healthy)

Three containers. Redis collects trace events on port 6379, Tempo stores traces on ports 3200 (HTTP) and 4317 (OTLP gRPC), and Grafana serves dashboards on port 3000.

Part 2: Register an LLM provider

!!! note “API key required” The LLM provider needs an ANTHROPIC_API_KEY environment variable. If you don’t have one, create one here and export it: export ANTHROPIC_API_KEY=sk-ant-...

An LLM provider wraps an external LLM API – Claude, GPT, Gemini – as a mesh capability. Other agents discover it by capability name, the same way tool agents discover each other. The provider agent is zero-code: the @mesh.llm_provider decorator handles the LiteLLM integration, request parsing, and response formatting.

Scaffold the provider

$ meshctl scaffold llm-provider --vendor claude --lang python --name claude-provider --port 9106

Replace the generated main.py with:

> *See the source code in the day's example directory.*

The decorator does all the work:

model="anthropic/claude-sonnet-4-5" – the LiteLLM model identifier. LiteLLM routes this to the Anthropic API using your ANTHROPIC_API_KEY.
capability="llm" – the capability name other agents use to discover this provider.
tags=["claude"] – tags for filtering. On Day 4 you’ll add GPT and Gemini providers with different tags and select between them.

The function body is pass – the decorator generates the full implementation.

Start the provider with all Day 2 agents

Day 2 ended with meshctl stop, so start the five tool agents alongside the new provider – this time with --dte to enable distributed tracing:

$ meshctl start --dte --debug -d -w flight-agent/main.py hotel-agent/main.py weather-agent/main.py poi-agent/main.py user-prefs-agent/main.py claude-provider/main.py

Starting 6 agents in detach: flight-agent, hotel-agent, weather-agent, poi-agent, user-prefs-agent, claude-provider
Logs: ~/.mcp-mesh/logs/<agent>.log
Use 'meshctl logs <agent>' to view or 'meshctl stop' to stop all

Check that all six registered:

$ meshctl list
Registry: running (http://localhost:8000) - 6 healthy

NAME                        RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
claude-provider-a8eb909e    Python    Agent   healthy   0/0    10.0.0.74:65349    5s    0s
flight-agent-be1924a4       Python    Agent   healthy   1/1    10.0.0.74:65350    5s    0s
hotel-agent-f8830ef1        Python    Agent   healthy   0/0    10.0.0.74:65354    5s    0s
poi-agent-801db357          Python    Agent   healthy   1/1    10.0.0.74:65351    5s    0s
user-prefs-agent-bfa9de39   Python    Agent   healthy   0/0    10.0.0.74:65353    5s    0s
weather-agent-0aed0742      Python    Agent   healthy   0/0    10.0.0.74:65355    5s    0s

Six agents. The five tool agents from Day 2 plus the new provider. The --dte flag enables distributed tracing for all of them – every cross-agent call now publishes trace events to Redis.

Mesh UI Topology showing seven agents with LLM provider connections

Part 3: Build the planner agent

The planner agent uses @mesh.llm to consume an LLM capability from the mesh. It takes a destination, dates, and budget, feeds them into a Jinja prompt template, and returns an LLM-generated itinerary.

The prompt template

Create planner-agent/prompts/plan_trip.j2:

> *See the source code in the day's example directory.*

The template variables – {{ destination }}, {{ dates }}, {{ budget }} – are populated from the context model at call time.

The planner code

Scaffold the agent, then replace main.py:

$ meshctl scaffold --name planner-agent --agent-type llm-agent --port 9107

> *See the source code in the day's example directory.*

Three things to note:

TripRequest(MeshContextModel) defines the context fields that map to template variables. Each field becomes a tool parameter and a template variable.
system_prompt="file://prompts/plan_trip.j2" loads the Jinja template from disk. At call time, mesh renders the template with the context fields and passes the result as the system prompt to the LLM.
provider={"capability": "llm"} tells mesh to find any agent that advertises the llm capability. Right now that’s claude-provider. The planner doesn’t know or care which model is behind that capability.

The llm parameter is injected by mesh, just like mesh.McpMeshTool in DI. Calling await llm(...) sends the user message plus the rendered system prompt to the resolved LLM provider.

Start the planner

$ meshctl start --dte --debug -d -w planner-agent/main.py

Check the full mesh:

$ meshctl list
Registry: running (http://localhost:8000) - 7 healthy

NAME                        RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
claude-provider-a8eb909e    Python    Agent   healthy   0/0    10.0.0.74:65349    57s   2s
flight-agent-be1924a4       Python    Agent   healthy   1/1    10.0.0.74:65350    57s   2s
hotel-agent-f8830ef1        Python    Agent   healthy   0/0    10.0.0.74:65354    57s   2s
planner-agent-2efb4dce      Python    Agent   healthy   0/0    10.0.0.74:65352    57s   2s
poi-agent-801db357          Python    Agent   healthy   1/1    10.0.0.74:65351    57s   2s
user-prefs-agent-bfa9de39   Python    Agent   healthy   0/0    10.0.0.74:65353    57s   2s
weather-agent-0aed0742      Python    Agent   healthy   0/0    10.0.0.74:65355    57s   2s

Seven agents. List the tools:

$ meshctl list --tools
TOOL                      AGENT                       CAPABILITY           TAGS
--------------------------------------------------------------------------------------------
claude_provider           claude-provider-a8eb909e    llm                  claude
flight_search             flight-agent-be1924a4       flight_search        flights,travel
get_user_prefs            user-prefs-agent-bfa9de39   user_preferences     preferences,travel
get_weather               weather-agent-0aed0742      weather_forecast     weather,travel
hotel_search              hotel-agent-f8830ef1        hotel_search         hotels,travel
plan_trip                 planner-agent-2efb4dce      trip_planning        planner,travel,llm
search_pois               poi-agent-801db357          poi_search           poi,travel

7 tool(s) found

Seven tools. Notice claude_provider with capability llm and plan_trip with capability trip_planning.

Start the UI

$ meshctl start --ui -d

Open http://localhost:3080 to see all seven agents in the dashboard. The two new agents – claude-provider and planner-agent – appear alongside the five from Day 2.

Part 4: Call the planner

$ meshctl call plan_trip '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' --trace

The --trace flag tells meshctl to display the trace ID after the response. The response is an LLM-generated itinerary:

{
  "structuredContent": {
    "result": "# Kyoto Itinerary: June 1-5, 2026 | Budget: $2,000\n\n## Budget Breakdown\n- Accommodation (4 nights): ~$400\n- Food: ~$400\n- Transportation: ~$100\n- Activities: ~$150\n- Reserve: ~$950\n\n## Day 1 - June 1 (Arrival & Eastern Kyoto)\nMorning: Arrive, check in (Gion area). Get ICOCA transit card.\nAfternoon: Kiyomizu-dera Temple -> Ninenzaka & Sannenzaka streets.\nEvening: Stroll through Gion district.\nRestaurant: Gion Kappa - kaiseki sets (~$30-40)\n\n## Day 2 - June 2 (Arashiyama)\nMorning: Bamboo Grove -> Tenryu-ji Temple.\nAfternoon: Monkey Park Iwatayama -> Togetsukyo Bridge.\nEvening: Pontocho Alley.\nRestaurant: Arashiyama Yoshimura - soba (~$15-20)\n\n..."
  },
  "isError": false
}

Trace ID: 2bb20ffe16ff3e03ff356aada9d11947
View trace: meshctl trace 2bb20ffe16ff3e03ff356aada9d11947

Here’s the call flow:

meshctl call discovers plan_trip via the registry and sends your JSON arguments to planner-agent.
planner-agent populates TripRequest from the arguments, renders plan_trip.j2 with destination="Kyoto", dates="June 1-5, 2026", budget="$2000", and sets it as the system prompt.
await llm(...) resolves the llm capability to claude-provider and sends the system prompt plus user message.
claude-provider calls the Anthropic API via LiteLLM and returns the generated text.
The itinerary flows back through the planner to your terminal.

You wrote no HTTP client code, no API key management in the planner, no routing logic. The planner knows what it needs (an LLM), not where to find it.

Part 5: Walk the trace

Now that the observability stack is running, you can inspect the full call tree. Copy the trace ID from the output above:

$ meshctl trace 2bb20ffe16ff3e03ff356aada9d11947

Call Tree for trace 2bb20ffe16ff3e03ff356aada9d11947

└─ plan_trip (planner-agent) [21835ms]
   └─ claude_provider (claude-provider) [21812ms]

Summary: 3 spans across 2 agents | 21.84s
Agents: claude-provider, planner-agent

The trace tree shows exactly what happened:

plan_trip (planner-agent) – the entry point. Received your JSON arguments, rendered the Jinja template, and delegated to the LLM provider.
claude_provider (claude-provider) – the LLM provider. Received the rendered prompt, called the Anthropic API via LiteLLM, and returned the generated itinerary.

The total time (~22 seconds) is almost entirely Claude’s inference time. The mesh overhead – discovery, routing, serialization – is in the low milliseconds.

The Traffic page in the mesh UI tracks this automatically – per-edge latency, error rates, token usage by model, and data transferred per agent. No instrumentation code needed; mesh collects it from the trace data.

Mesh UI Traffic page showing per-edge latency, token usage, and per-agent stats

In Grafana at http://localhost:3000, you can drill into each span, see request/response payloads, and visualize latency in a waterfall chart. Navigate to Explore and select the Tempo datasource to search for traces.

Grafana Tempo trace view showing planner-agent to claude-provider call

This is the payoff for the observability setup at the start of the chapter. From now on, every meshctl call --trace gives you a trace ID, and meshctl trace <id> shows the full call tree across all agents involved. As your mesh grows, traces will span more agents – on Day 4 when the planner calls tool agents, the trace tree will show the full chain from planner to LLM to tool agents and back.

!!! tip “Trace propagation” Trace context propagates automatically across mesh calls. When planner-agent calls claude-provider, mesh injects trace headers so the provider’s spans link back to the planner’s span. You don’t need to pass trace IDs manually.

!!! info “LLM provider abstraction” The planner declares a dependency on the llm capability – it has no idea it’s talking to Claude. On Day 4 you’ll add GPT and Gemini providers and swap between them by changing a tag. The planner’s code won’t change.

Leave it running

From here on, your agents stay running between chapters. On Day 4 you’ll add more LLM providers and introduce provider tiers – just start the new agents with --dte and they join the existing mesh.

Keep the observability stack running too (docker compose stays up). Traces from Day 4 calls will appear in the same Grafana instance.

If you do need to stop for any reason, meshctl stop shuts down all agents, and docker compose -f docker-compose.observability.yml down stops the observability stack.

Troubleshooting

Docker not running / compose fails. The observability stack runs in Docker. Make sure Docker Desktop (or your Docker daemon) is running before docker compose -f docker-compose.observability.yml up -d. If ports 6379, 3200, or 3000 are already in use, stop the conflicting services or change the ports in docker-compose.observability.yml.

ANTHROPIC_API_KEY not set. The claude-provider agent needs an Anthropic API key. Set it in your environment:

$ export ANTHROPIC_API_KEY=sk-ant-...

If the key is missing, the provider will start but LLM calls will fail with an authentication error.

Traces not appearing. Check two things:

Agents were started with --dte (or MCP_MESH_DISTRIBUTED_TRACING_ENABLED=true).
Redis is reachable at redis://localhost:6379 (run redis-cli ping).

If you started agents without --dte, stop them with meshctl stop and restart with the flag.

Observability stack on non-default ports. If you’re running Redis, Tempo, or Grafana on non-standard ports (because the defaults are already in use), set the corresponding environment variables before starting agents:

export REDIS_URL=redis://localhost:6380          # default: 6379
export TELEMETRY_ENDPOINT=localhost:4318         # default: 4317
export TEMPO_URL=http://localhost:3201           # default: 3200

meshctl trace returns “trace not found”. Traces take a few seconds to propagate from Redis through the registry to Tempo. Wait 5-10 seconds after the call completes, then try again. You can also pass --retries 5 to have meshctl retry automatically.

Recap

You stood up an observability stack (Redis, Tempo, Grafana), registered a zero-code LLM provider, built a planner agent that generates itineraries via prompt templates, and traced the full call tree across agents. The planner consumed the LLM capability the same way flight-agent consumes user_preferences – by declaring what it needs, not where to find it.

Next up

Day 4 adds a second LLM provider (GPT), introduces tag-based provider selection with automatic failover, and connects the planner to your tool agents so it can look up real flight and hotel data while generating itineraries.

Day 4 – Multiple Providers and Dependency Tiers

Your planner works, but it’s locked to one LLM provider and generates plans from imagination. Today you’ll add a second LLM provider, introduce preference-based routing with automatic failover, and connect the planner to your tool agents so it plans with real flight and hotel data.

What we’re building today

graph LR
    subgraph Providers
        CP[claude-provider]
        OP[openai-provider]
    end

    subgraph Tool Agents
        FA[flight-agent] -->|depends on| UPA[user-prefs-agent]
        PA[poi-agent] -->|depends on| WA[weather-agent]
        HA[hotel-agent]
    end

    PL[planner-agent] -.->|"+claude" preference| CP
    PL -.->|failover| OP
    PL ==>|tier-1 prefetch| UPA
    PL -.->|tier-2 LLM tools| FA
    PL -.->|tier-2 LLM tools| HA
    PL -.->|tier-2 LLM tools| WA
    PL -.->|tier-2 LLM tools| PA

    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style PL fill:#9b59b6,color:#fff

Eight agents. The five tool agents you already know (blue and green), two LLM providers in purple (Claude and OpenAI), and the planner – now connected to everything. The solid arrow is a tier-1 dependency (prefetched before the LLM call). The dashed arrows are tier-2 (tools the LLM discovers and calls during its reasoning loop).

Today has six parts:

Add a second LLM provider – wrap OpenAI as a mesh capability
Provider tags and preference routing – teach +/- tag operators
Provider swap – zero code changes – stop Claude, watch failover
Connect the planner to tool agents – tier-1 prefetch and tier-2 tools
Call the enhanced planner – generate a plan with real data
Walk the trace – see the full call tree across all eight agents

Part 1: Add a second LLM provider

!!! note “API keys required” You need both ANTHROPIC_API_KEY and OPENAI_API_KEY set in your environment. If you don’t have an OpenAI key, create one here and export it: export OPENAI_API_KEY=sk-...

The OpenAI provider follows the exact same pattern as the Claude provider from Day 3. Same decorator, same zero-code body, different model string.

Scaffold the provider

$ meshctl scaffold llm-provider --vendor openai --lang python --name openai-provider --port 9108

Replace the generated main.py with:

> *See the source code in the day's example directory.*

The only differences from claude-provider:

model="openai/gpt-4o-mini" – LiteLLM routes this to the OpenAI API using your OPENAI_API_KEY.
tags=["openai", "gpt"] – different tags so consumers can distinguish between providers.

The capability name is still "llm" – both providers advertise the same capability. This is how the mesh supports multiple providers for the same function.

Start the provider

$ meshctl start --dte --debug -d -w openai-provider/main.py

Check the mesh:

$ meshctl list

Registry: running (http://localhost:8000) - 8 healthy

NAME                        RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
claude-provider-0a89e8c6    Python    Agent   healthy   0/0    10.0.0.74:49486    1m    2s
flight-agent-a939da4b       Python    Agent   healthy   1/1    10.0.0.74:49480    1m    2s
hotel-agent-9932ac09        Python    Agent   healthy   0/0    10.0.0.74:49482    1m    2s
openai-provider-40a5c637    Python    Agent   healthy   0/0    10.0.0.74:49485    4s    4s
planner-agent-fb07b918      Python    Agent   healthy   1/1    10.0.0.74:49484    1m    2s
poi-agent-97bd9fcc          Python    Agent   healthy   1/1    10.0.0.74:49481    1m    2s
user-prefs-agent-87506c4a   Python    Agent   healthy   0/0    10.0.0.74:49479    1m    2s
weather-agent-a6f7ea5e      Python    Agent   healthy   0/0    10.0.0.74:49483    1m    2s

Eight agents. List the tools:

$ meshctl list --tools

TOOL                      AGENT                       CAPABILITY           TAGS
--------------------------------------------------------------------------------------------
claude_provider           claude-provider-0a89e8c6    llm                  claude
flight_search             flight-agent-a939da4b       flight_search        flights,travel
get_user_prefs            user-prefs-agent-87506c4a   user_preferences     preferences,travel
get_weather               weather-agent-a6f7ea5e      weather_forecast     weather,travel
hotel_search              hotel-agent-9932ac09        hotel_search         hotels,travel
openai_provider           openai-provider-40a5c637    llm                  openai,gpt
plan_trip                 planner-agent-fb07b918      trip_planning        planner,travel,llm
search_pois               poi-agent-97bd9fcc          poi_search           poi,travel

8 tool(s) found

Two tools with capability llm – claude_provider and openai_provider. Both are available. Right now, if the planner asks for {"capability": "llm"}, the registry picks one at random. You need a way to express a preference.

Part 2: Provider tags and preference routing

MCP Mesh tags support three operators for consumer-side selection:

Prefix	Meaning	Example
(none)	Required	`"api"` – must have this tag
`+`	Preferred	`"+claude"` – bonus if present
`-`	Excluded	`"-deprecated"` – reject if present

These operators are for the consumer side only (the provider= or dependencies= spec). When you declare tags on your provider, use plain strings without prefixes.

The matching algorithm:

Filter – remove candidates with any excluded tag (-)
Require – keep only candidates with all required tags (no prefix)
Score – add bonus points for each preferred tag (+) present
Select – return the highest-scoring candidate

Update the planner’s provider selection

In Day 3, the planner used provider={"capability": "llm"} – any provider will do. Now add a preference for Claude:

> *See the source code in the day's example directory.*

+claude means: “prefer a provider tagged claude. If one is available, route there. If not, fall back to any other provider with capability llm.” The + makes it a preference, not a requirement – the planner still works even if Claude is down.

Compare with alternatives:

"claude" (no prefix) – required. If Claude is down, the call fails. No fallback.
"+claude" – preferred. If Claude is down, route to the next available provider. Automatic failover.
"-gemini" – excluded. Never route to a provider tagged gemini, even if it’s the only one available.

Part 3: Provider swap – zero code changes

This is where capability-based routing pays off. You’ll call the planner three times, stopping and restarting Claude between calls, and watch the trace show different providers without changing a single line of code.

Call 1: Claude is preferred and available

$ meshctl call plan_trip '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' --trace

The response is a Kyoto itinerary. Check the trace:

$ meshctl trace <trace-id>

Call Tree for trace 16f53c4095e481d329515600024f365c
════════════════════════════════════════════════════════════

└─ plan_trip (planner-agent) [18349ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [1ms] ✓
   └─ claude_provider (claude-provider) [18308ms] ✓
      ├─ search_pois (poi-agent) [31ms] ✓
      │  └─ get_weather (weather-agent) [1ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      └─ hotel_search (hotel-agent) [1ms] ✓

────────────────────────────────────────────────────────────
Summary: 11 spans across 6 agents | 18.35s | ✓

The planner routed to claude_provider. The tool calls you see under the provider (search_pois, get_weather, hotel_search) are tier-2 calls – Claude decided to call those tools during its reasoning loop. More on that in Part 4.

Call 2: Stop Claude, watch failover

$ meshctl stop claude-provider

Agent 'claude-provider' stopped

Now call the planner again. Same code, same arguments, same mesh:

$ meshctl call plan_trip '{"destination":"Tokyo","dates":"June 10-14, 2026","budget":"$3000"}' --trace

The response is a Tokyo itinerary – generated by GPT, not Claude. Check the trace:

$ meshctl trace <trace-id>

Call Tree for trace 2c71f26f5df8bbe8efbdb36f4ddbbea8
════════════════════════════════════════════════════════════

└─ plan_trip (planner-agent) [15963ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [0ms] ✓
   └─ openai_provider (openai-provider) [15928ms] ✓
      ├─ flight_search (flight-agent) [22ms] ✓
      │  └─ get_user_prefs (user-prefs-agent) [0ms] ✓
      ├─ hotel_search (hotel-agent) [0ms] ✓
      └─ search_pois (poi-agent) [12ms] ✓
         └─ get_weather (weather-agent) [0ms] ✓

────────────────────────────────────────────────────────────
Summary: 12 spans across 7 agents | 15.96s | ✓

openai_provider (openai-provider). Same planner code, same tools, different LLM. No code change, no config change, no restart. The registry saw that Claude was down, found another healthy provider with capability llm, and routed there.

Mesh UI Topology during failover — planner routed to openai-provider while claude-provider is down

Call 3: Restart Claude, verify preference

$ meshctl start --dte --debug -d -w claude-provider/main.py

Wait a few seconds for registration, then call again:

$ meshctl call plan_trip '{"destination":"Osaka","dates":"June 20-22, 2026","budget":"$1500"}' --trace

Check the trace:

$ meshctl trace <trace-id>

Call Tree for trace d208aeaebcc78ebfdaed968eebbeae28
════════════════════════════════════════════════════════════

└─ plan_trip (planner-agent) [18020ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [0ms] ✓
   └─ claude_provider (claude-provider) [17984ms] ✓
      ├─ flight_search (flight-agent) [13ms] ✓
      │  └─ get_user_prefs (user-prefs-agent) [0ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      ├─ search_pois (poi-agent) [19ms] ✓
      │  └─ get_weather (weather-agent) [0ms] ✓
      └─ hotel_search (hotel-agent) [0ms] ✓

────────────────────────────────────────────────────────────
Summary: 18 spans across 7 agents | 18.02s | ✓

Back to claude_provider. The +claude preference kicks in again because Claude is healthy and has the highest tag score.

Mesh UI Topology with Claude back — planner prefers claude-provider via +claude tag

Notice that openai-provider is still healthy and connected to the mesh. The planner routes to claude-provider because of the +claude preference tag — not because OpenAI is unavailable. Both providers are ready; mesh picks the preferred one.

Three calls, three traces, two different providers. The planner’s code didn’t change once.

Part 4: Connect the planner to tool agents

On Day 3, the planner generated itineraries from the LLM’s training data – no real flight prices, no actual hotel availability. Today you’ll connect it to your tool agents using two dependency mechanisms.

Tier-1: prefetch dependencies

Tier-1 dependencies are fetched before the LLM call. Your code calls them explicitly and injects the results into the prompt context. The LLM always sees this data.

For the planner, that’s user_preferences – fetch the user’s travel preferences and include them in every prompt:

> *See the source code in the day's example directory.*

This is the same dependencies=[...] syntax from Day 2. The user_prefs parameter is injected by mesh DI, just like flight-agent gets its user_prefs dependency. The planner calls it before the LLM call and formats the result into a preferences summary string.

Tier-2: LLM-discoverable tools

Tier-2 tools are made available to the LLM during its reasoning loop. The LLM discovers them via their schemas and decides which to call based on the user’s question. You don’t call them – the LLM does.

> *See the source code in the day's example directory.*

The filter parameter tells the registry which tools to expose to the LLM:

{"capability": "flight_search"} – flights
{"capability": "hotel_search"} – hotels
{"capability": "weather_forecast"} – weather
{"capability": "poi_search"} – points of interest

filter_mode="all" means include every matching tool (not just the best match per capability). max_iterations=10 gives the LLM up to 10 rounds of tool calling – enough to search flights, check hotels, look up weather, and find attractions in a single planning session.

The two tiers together

Here is the updated planner with both tiers:

> *See the source code in the day's example directory.*

The execution flow:

Tier-1: user_prefs is called explicitly. The result is formatted and passed as context={"user_preferences": prefs_summary} to the LLM call. The Jinja template renders it into the system prompt.
Tier-2: flight_search, hotel_search, get_weather, search_pois are presented to the LLM as callable tools. The LLM decides which to call during await llm(...).

The distinction matters:

Tier-1 runs before the LLM, every time. You control what data the LLM sees. User preferences are always in the prompt.
Tier-2 runs during the LLM’s reasoning. The LLM chooses whether to search for flights or just answer from training data. You control which tools are available; the LLM controls which ones to use.

The updated prompt template

The Jinja template now includes user preferences:

> *See the source code in the day's example directory.*

The new guidelines tell the LLM to use the available tools for real data rather than guessing. The {{ user_preferences }} variable is populated from the tier-1 prefetch.

Part 5: Call the enhanced planner

With all eight agents running:

$ meshctl call plan_trip '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' --trace

The response now includes real data from your tool agents – flight prices from flight-agent, hotel options from hotel-agent, weather from weather-agent, and attractions from poi-agent. The LLM weaves this data into a coherent itinerary, respecting the user’s preferences (preferred airlines, minimum hotel stars, interests).

Part 6: Walk the trace

$ meshctl trace <trace-id>

└─ plan_trip (planner-agent) [18349ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [1ms] ✓
   └─ claude_provider (claude-provider) [18308ms] ✓
      ├─ search_pois (poi-agent) [31ms] ✓
      │  └─ get_weather (weather-agent) [1ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      └─ hotel_search (hotel-agent) [1ms] ✓

This is the most complex trace in the tutorial so far. Read it top to bottom:

plan_trip (planner-agent) – the entry point. Receives the user’s request.
get_user_prefs (user-prefs-agent) – tier-1 prefetch. The planner’s code calls this explicitly before the LLM. Takes 1ms. User preferences are now in the prompt context.
claude_provider (claude-provider) – the LLM call. The planner sends the rendered prompt (with user preferences baked in) plus the user message to Claude.
search_pois, get_weather, hotel_search – tier-2 tool calls. Claude decided to call these tools during its reasoning loop. Each tool call appears as a child span under claude_provider. Notice that search_pois triggers its own DI call to get_weather (from Day 2) – the dependency chain is fully traced.

The planner’s total time (~18 seconds) is mostly Claude’s inference. The mesh overhead – discovering tools, routing to providers, serializing requests – adds single-digit milliseconds.

!!! tip “Trace depth” The trace tree can go multiple levels deep. plan_trip calls claude_provider, which calls search_pois, which calls get_weather. Each hop is a separate span, linked by trace context that propagates automatically across mesh calls. You get this for free – no manual instrumentation.

Leave it running

Your eight agents are running in watch mode. On Day 5 you’ll add an HTTP gateway. No need to stop between chapters.

Troubleshooting

OPENAI_API_KEY not set. The openai-provider agent needs an OpenAI API key. Set it in your environment:

$ export OPENAI_API_KEY=sk-...

If the key is missing, the provider will start but LLM calls routed to it will fail with an authentication error.

Provider swap doesn’t work. Both providers must have the same capability name ("llm"). Check with meshctl list --tools – both claude_provider and openai_provider should show capability llm. If one shows a different capability, update the capability parameter in @mesh.llm_provider.

Tool calls not appearing in trace. Check two things:

The planner’s filter parameter lists the correct capabilities (flight_search, hotel_search, etc.).
max_iterations is high enough (10 is good). If set to 1, the LLM gets one shot and may not call any tools.

Planner returns a generic plan without real data. The LLM didn’t call the tier-2 tools. This can happen if:

The filter capabilities don’t match any registered tools. Verify with meshctl list --tools.
The system prompt doesn’t instruct the LLM to use tools. Check that plan_trip.j2 includes the guideline about using available tools.
filter_mode is set to something other than "all". Use "all" to expose all matching tools.

Tier-1 prefetch not working. Check that user-prefs-agent is running and the planner shows 1/1 in the DEPS column of meshctl list. If it shows 0/1, the dependency hasn’t resolved yet – wait a few seconds and check again.

Recap

You added a provider, swapped it with zero code changes, and connected the planner to real data sources. The planner’s code changed in two places: a tag preference and a dependency list. Everything else – failover, tool discovery, trace propagation – happened at runtime.

Next up

Day 5 wraps the trip planner in a FastAPI gateway, exposing it as a REST API with @mesh.route. Five lines of code, zero business logic in the gateway – just HTTP to mesh and back.

Day 5 – HTTP Gateway

Your trip planner works from the terminal via meshctl call. But real users need an HTTP API. Today you’ll wrap the planner in a FastAPI gateway – a thin REST endpoint that bridges HTTP requests to mesh tool calls. By the end of Part 1, you’ll have a complete, callable trip planning API.

What we’re building today

graph LR
    U[User] -->|"POST /plan"| GW[gateway]
    GW -->|"trip_planning"| PL[planner-agent]
    PL -->|"+claude"| CP[claude-provider]
    PL -.->|failover| OP[openai-provider]
    PL ==>|tier-1| UPA[user-prefs-agent]
    CP -.->|tier-2| FA[flight-agent]
    CP -.->|tier-2| HA[hotel-agent]
    CP -.->|tier-2| WA[weather-agent]
    CP -.->|tier-2| PA[poi-agent]
    FA -->|depends on| UPA
    PA -->|depends on| WA

    style U fill:#555,color:#fff
    style GW fill:#e67e22,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff

Nine agents. Everything from Day 4 (blue, green, purple) plus the gateway in orange. The user sends an HTTP request to the gateway. The gateway calls the planner through mesh dependency injection. The planner calls the LLM provider, which calls the tool agents. The gateway doesn’t know any of this – it just calls plan_trip and returns the result.

Today has four parts:

Build the gateway – a FastAPI app with @mesh.route
Start the gateway – add it to your running mesh
Call the API – curl the gateway and compare with meshctl call
Walk the trace – see the full call tree from HTTP to tool agents

Part 1: Build the gateway

Scaffold the gateway

$ meshctl scaffold --name gateway --agent-type api --lang python --port 8080

Replace the generated main.py with:

> *See the source code in the day's example directory.*

That’s the entire gateway. Three imports, a health check, and one route handler.

How @mesh.route works

@mesh.route is a decorator for FastAPI handlers that injects mesh capabilities as function parameters – the same dependency injection that @mesh.tool uses, but for HTTP endpoints instead of MCP tools.

> *See the source code in the day's example directory.*

The key line is @mesh.route(dependencies=["trip_planning"]). This tells mesh: “Before this handler runs, resolve the trip_planning capability and inject it as a callable.” The parameter name plan_trip matches the tool name registered by planner-agent. The type hint McpMeshTool tells mesh to inject a tool proxy.

The handler is five lines of code:

Parse the JSON body.
Check that the tool was injected (defensive – it should always resolve if the planner is running).
Call the injected tool with the request parameters.
Return the result.

The gateway doesn’t import the planner. It doesn’t know the planner’s URL. It declares a dependency on trip_planning, and mesh injects a callable. When you add new tool agents on Day 6, the gateway won’t change – it calls the planner, and the planner discovers new tools automatically.

Part 2: Start the gateway

Your eight agents from Day 4 should still be running. Add the gateway:

$ meshctl start --dte --debug -d -w gateway/main.py

Check the mesh:

$ meshctl list

Registry: running (http://localhost:8000) - 9 healthy

NAME                        RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
claude-provider-0a89e8c6    Python    Agent   healthy   0/0    10.0.0.74:49486    10m   2s
flight-agent-a939da4b       Python    Agent   healthy   1/1    10.0.0.74:49480    10m   2s
gateway-7b3f2e91            Python    API     healthy   1/1    10.0.0.74:8080     4s    4s
hotel-agent-9932ac09        Python    Agent   healthy   0/0    10.0.0.74:49482    10m   2s
openai-provider-40a5c637    Python    Agent   healthy   0/0    10.0.0.74:49485    10m   2s
planner-agent-fb07b918      Python    Agent   healthy   1/1    10.0.0.74:49484    10m   2s
poi-agent-97bd9fcc          Python    Agent   healthy   1/1    10.0.0.74:49481    10m   2s
user-prefs-agent-87506c4a   Python    Agent   healthy   0/0    10.0.0.74:49479    10m   2s
weather-agent-a6f7ea5e      Python    Agent   healthy   0/0    10.0.0.74:49483    10m   2s

Nine agents. The gateway shows type API (not Agent) and its dependency 1/1 resolved – it found the trip_planning capability from planner-agent.

List the tools:

$ meshctl list --tools

TOOL                      AGENT                       CAPABILITY           TAGS
--------------------------------------------------------------------------------------------
claude_provider           claude-provider-0a89e8c6    llm                  claude
flight_search             flight-agent-a939da4b       flight_search        flights,travel
get_user_prefs            user-prefs-agent-87506c4a   user_preferences     preferences,travel
get_weather               weather-agent-a6f7ea5e      weather_forecast     weather,travel
hotel_search              hotel-agent-9932ac09        hotel_search         hotels,travel
openai_provider           openai-provider-40a5c637    llm                  openai,gpt
plan_trip                 planner-agent-fb07b918      trip_planning        planner,travel,llm
search_pois               poi-agent-97bd9fcc          poi_search           poi,travel

8 tool(s) found

The gateway doesn’t appear in the tool list – it doesn’t expose any tools. It consumes the trip_planning capability via @mesh.route, not @mesh.tool. This is the difference between an API agent and a tool agent: API agents are HTTP entry points into the mesh, not MCP tool providers.

Mesh UI Topology showing nine agents with the API gateway at the top

Part 3: Call the API

Via curl

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}'

{
  "result": "## Kyoto Trip Itinerary: June 1-5, 2026\n\n**Budget: $2,000**\n\n### Day 1 (June 1) - Arrival & Eastern Kyoto\n\n**Morning:**\n- Arrive via SQ017 ($901) — preferred airline per your preferences\n- Check into Sakura Inn ($95/night, 3-star) — meets your minimum star rating\n\n**Afternoon:**\n- Visit Fushimi Inari Shrine (cultural — matches your interests)\n- Walk the thousand torii gates trail\n\n**Evening:**\n- Dinner at Nishiki Market area — street food tour (food interest)\n- Explore Gion district\n\n..."
}

A full trip itinerary, personalized with the user’s preferences (preferred airlines, hotel stars, interests), built from real data returned by your tool agents.

Via meshctl

For comparison, the same call through meshctl:

$ meshctl call plan_trip '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' --trace

Same result, different transport. The curl path goes user -> gateway -> planner -> LLM -> tools. The meshctl path goes user -> registry -> planner -> LLM -> tools. Both end up at the same planner with the same tools.

Part 4: Walk the trace

If you called via meshctl --trace, you got a trace ID. View it:

$ meshctl trace <trace-id>

Call Tree for trace a4e8b2c91f7d3e56a8120900037f48d1
════════════════════════════════════════════════════════════

└─ plan_trip (planner-agent) [17842ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [1ms] ✓
   └─ claude_provider (claude-provider) [17803ms] ✓
      ├─ flight_search (flight-agent) [15ms] ✓
      │  └─ get_user_prefs (user-prefs-agent) [0ms] ✓
      ├─ hotel_search (hotel-agent) [1ms] ✓
      ├─ get_weather (weather-agent) [0ms] ✓
      ├─ search_pois (poi-agent) [22ms] ✓
      │  └─ get_weather (weather-agent) [0ms] ✓
      └─ get_weather (weather-agent) [0ms] ✓

────────────────────────────────────────────────────────────
Summary: 14 spans across 7 agents | 17.84s | ✓

The full call tree: planner prefetches user preferences (tier-1), calls Claude (who calls flight, hotel, weather, and POI tools during its reasoning loop), and returns the assembled itinerary. Every hop is a separate span with sub-millisecond mesh overhead.

!!! tip “The thin wrapper pattern” The gateway has no business logic. It translates HTTP to mesh and mesh to HTTP. That’s it. When you add a new tool agent on Day 6, the gateway doesn’t change – it calls the planner, and the planner discovers new tools automatically. If you need a second endpoint (say, POST /flights for direct flight search), you add one @mesh.route handler. The gateway stays thin.

Cross-language gateway swap

!!! tip “Choose your adventure” One of mesh’s strengths is that any agent – including the gateway – can be swapped for a different language without changing anything else. The planner, providers, and tool agents don’t care what language the gateway is written in.

Want to see this in action? Pick one:

- **[Build the gateway in Spring Boot](../java/spring-boot-integration.md)** --
  same REST endpoints, same mesh DI, Java instead of Python
- **[Build the gateway in Express](../typescript/express-integration.md)** --
  same endpoints, TypeScript
- **Skip** -- continue to [Day 6](day-06-chat-history.md) with the FastAPI
  gateway

Stop the Python gateway with `meshctl stop gateway`, build the replacement
in your language of choice, and start it with `meshctl start`. The rest of
the mesh keeps running.

Part 1 complete

That’s Part 1. You have a working trip planner: nine agents, two LLM providers with automatic failover, dependency injection across tools and providers, prompt templates, distributed traces, and an HTTP API. All of it running locally with meshctl start and an observability stack in Docker.

Part 2 grows this into something production-shaped – chat history, specialist committees, Docker Compose packaging, Kubernetes deployment, and a full observability walkthrough.

Leave it running

Your nine agents are running in watch mode. On Day 6 you’ll add Redis-backed chat history. No need to stop between chapters.

Troubleshooting

Port 8080 already in use. The gateway defaults to port 8080. If another service is using that port, either stop the conflicting service or change the port in gateway/main.py:

uvicorn.run(app, host="0.0.0.0", port=8081, log_level="info")

FastAPI not installed. The gateway requires fastapi and uvicorn. If you see ModuleNotFoundError: No module named 'fastapi', install them in your venv:

$ pip install fastapi uvicorn

Gateway starts but curl fails. Check three things:

The gateway is healthy: meshctl list should show gateway with status healthy and deps 1/1.
You’re using the correct port: check the meshctl list output for the gateway’s endpoint.
The planner is running: the gateway depends on trip_planning. If the planner is down, the gateway starts but tool injection fails.

curl returns an error response. If the response is {"error": "trip_planning capability unavailable"}, the planner hasn’t registered yet or its dependency on llm hasn’t resolved. Check meshctl list – the planner should show healthy with deps 1/1. Also verify your LLM API keys are set (ANTHROPIC_API_KEY or OPENAI_API_KEY).

curl returns empty or truncated response. The LLM is still generating. Trip planning calls take 15-20 seconds depending on the LLM provider. If curl times out, increase the timeout:

$ curl -s --max-time 60 -X POST http://localhost:8080/plan ...

Recap

You wrapped your trip planner in a five-line FastAPI handler, bridging HTTP to mesh with @mesh.route. The gateway is a thin entry point – no business logic, no planner imports, no hardcoded URLs. It declares what it needs (trip_planning), mesh injects a callable, and the handler forwards the request. Two transports (curl and meshctl) reach the same planner through different paths.

Next up

Day 6 adds Redis-backed chat history so users can iterate on their trip plans across multiple turns.

Day 6 – Chat History

Your trip planner generates great itineraries, but every call starts from scratch. Real users iterate – “make it cheaper,” “add a beach day,” “what about hotels near the train station.” Today you add conversation memory so the planner remembers what you have discussed.

What we’re building today

graph LR
    U[User] -->|"POST /plan"| GW[gateway]
    GW -->|"trip_planning"| PL[planner-agent]
    PL -->|"chat_history"| CH[chat-history-agent]
    PL -->|"+claude"| CP[claude-provider]
    PL -.->|failover| OP[openai-provider]
    PL ==>|tier-1| UPA[user-prefs-agent]
    CP -.->|tier-2| FA[flight-agent]
    CP -.->|tier-2| HA[hotel-agent]
    CP -.->|tier-2| WA[weather-agent]
    CP -.->|tier-2| PA[poi-agent]
    FA -->|depends on| UPA
    PA -->|depends on| WA

    style U fill:#555,color:#fff
    style GW fill:#e67e22,color:#fff
    style CH fill:#1abc9c,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff

Ten agents. Everything from Day 5 plus chat-history-agent in teal. The planner fetches prior turns from chat history before calling the LLM, and saves both the user message and the response afterward. The gateway stays thin – it just passes the session ID through.

Today has four parts:

Build the chat history agent – a tool agent backed by Redis
Update the planner – add history fetch and save around the LLM call
Update the gateway – add session ID passthrough
Walk the trace – see history calls in the distributed trace

Part 1: Build the chat history agent

Chat history is just another mesh tool agent. The same dependency injection that wires flight-agent wires chat-history-agent. There is no special framework primitive for state – you write an agent that wraps a data store, and other agents call it like any other tool.

Scaffold the agent

$ meshctl scaffold --name chat-history-agent --agent-type tool --port 9109

Created agent 'chat-history-agent' in chat-history-agent/

Generated files:
  chat-history-agent/
  ├── .dockerignore
  ├── Dockerfile
  ├── README.md
  ├── __init__.py
  ├── __main__.py
  ├── helm-values.yaml
  ├── main.py
  └── requirements.txt

Add Redis to requirements

The agent needs redis-py to talk to the Redis instance from your observability stack (Day 3’s docker-compose.observability.yml already runs Redis on port 6379):

> *See the source code in the day's example directory.*

Replace main.py

Replace the generated main.py with:

> *See the source code in the day's example directory.*

Two tools, one capability. save_turn appends a JSON-encoded turn to a Redis list keyed by session ID. get_history reads the most recent turns from that list. Both tools share the chat_history capability – when the planner declares a dependency on chat_history, mesh injects a proxy that can call either tool by name.

The Redis connection is straightforward: a module-level redis.Redis client pointed at localhost:6379 (configurable via environment variables for Docker/Kubernetes deployment).

> *See the source code in the day's example directory.*

Why this works

Swap Redis for Postgres by editing one agent. Add encryption by extending one agent. The gateway and planner do not move. mesh does not need a chat history primitive – the general abstraction (any MCP tool anywhere is a local function call) handles it.

Part 2: Update the planner

The planner gains chat history as a tier-1 dependency alongside user preferences. It fetches history before the LLM call and saves turns after. The gateway stays thin – it just passes the session ID.

> *See the source code in the day's example directory.*

Dependency declaration

The @mesh.tool decorator now declares two dependencies instead of one:

> *See the source code in the day's example directory.*

Both user_preferences and chat_history are tier-1 dependencies – resolved before the tool function runs. The planner calls chat_history.call_tool("get_history", {...}) and chat_history.call_tool("save_turn", {...}) because the chat_history capability exposes two tools. For user_prefs, the single-tool shorthand (await user_prefs(...)) still works.

History fetch

Before the LLM call, the planner fetches the conversation history for the current session:

> *See the source code in the day's example directory.*

Multi-turn messages

When history is present, the planner passes the full message list to the LLM instead of a single string:

> *See the source code in the day's example directory.*

The @mesh.llm decorator handles multi-turn natively – pass a list of {"role": "...", "content": "..."} dicts as the first argument to llm() and the decorator builds the correct LLM API call. The system prompt from the Jinja2 template is inserted automatically.

History save

After the LLM responds, the planner saves both the user turn and the assistant turn so the next request sees them:

> *See the source code in the day's example directory.*

Part 3: Update the gateway

The gateway gains a session_id parameter. Everything else stays the same – one dependency, five lines of code.

> *See the source code in the day's example directory.*

Session ID

> *See the source code in the day's example directory.*

If the client sends X-Session-Id, the gateway uses it. Otherwise it generates a UUID and returns it in the response so the client can use it for follow-up calls. The gateway passes session_id to the planner alongside the trip parameters – the planner handles the rest.

Start and test

Install redis-py

If redis is not already in your venv:

$ pip install redis

Start the chat history agent

Your nine agents from Day 5 should still be running. Add chat-history-agent:

$ meshctl start --dte --debug -d -w chat-history-agent/main.py

If you are starting fresh, launch everything at once:

$ meshctl start --dte --debug -d -w \
    chat-history-agent/main.py \
    claude-provider/main.py \
    openai-provider/main.py \
    flight-agent/main.py \
    hotel-agent/main.py \
    weather-agent/main.py \
    poi-agent/main.py \
    user-prefs-agent/main.py \
    planner-agent/main.py \
    gateway/main.py

Check the mesh:

$ meshctl list

Registry: running (http://localhost:8000) - 10 healthy

NAME                             RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
chat-history-agent-3f2a1b9c      Python    Agent   healthy   0/0    10.0.0.74:9109     8s    2s
claude-provider-0a89e8c6         Python    Agent   healthy   0/0    10.0.0.74:49486    15m   2s
flight-agent-a939da4b            Python    Agent   healthy   1/1    10.0.0.74:49480    15m   2s
gateway-7b3f2e91                 Python    API     healthy   1/1    10.0.0.74:8080     5m    2s
hotel-agent-9932ac09             Python    Agent   healthy   0/0    10.0.0.74:49482    15m   2s
openai-provider-40a5c637         Python    Agent   healthy   0/0    10.0.0.74:49485    15m   2s
planner-agent-fb07b918           Python    Agent   healthy   2/2    10.0.0.74:49484    15m   2s
poi-agent-97bd9fcc               Python    Agent   healthy   1/1    10.0.0.74:49481    15m   2s
user-prefs-agent-87506c4a        Python    Agent   healthy   0/0    10.0.0.74:49479    15m   2s
weather-agent-a6f7ea5e           Python    Agent   healthy   0/0    10.0.0.74:49483    15m   2s

Ten agents. The gateway shows 1/1 dependency – just trip_planning. The planner shows 2/2 dependencies – it resolved both user_preferences and chat_history.

List the tools:

$ meshctl list --tools

TOOL                      AGENT                            CAPABILITY           TAGS
-----------------------------------------------------------------------------------------------
claude_provider           claude-provider-0a89e8c6         llm                  claude
flight_search             flight-agent-a939da4b            flight_search        flights,travel
get_history               chat-history-agent-3f2a1b9c      chat_history         chat,history,state
get_user_prefs            user-prefs-agent-87506c4a        user_preferences     preferences,travel
get_weather               weather-agent-a6f7ea5e           weather_forecast     weather,travel
hotel_search              hotel-agent-9932ac09             hotel_search         hotels,travel
openai_provider           openai-provider-40a5c637         llm                  openai,gpt
plan_trip                 planner-agent-fb07b918           trip_planning        planner,travel,llm
save_turn                 chat-history-agent-3f2a1b9c      chat_history         chat,history,state
search_pois               poi-agent-97bd9fcc               poi_search           poi,travel

10 tool(s) found

Two new tools: save_turn and get_history, both from chat-history-agent.

Mesh UI Topology showing ten agents with chat-history-agent connected to planner

Multi-turn demo

Turn 1 – plan a trip:

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: test-session-1" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}'

{
  "result": "## Kyoto Trip Itinerary: June 1-5, 2026\n\n**Budget: $2,000**\n\n### Day 1 (June 1) - Arrival & Eastern Kyoto\n\n**Morning:**\n- Arrive via SQ017 ($901) — preferred airline per your preferences\n- Check into Sakura Inn ($95/night, 3-star) — meets your minimum star rating\n\n**Afternoon:**\n- Visit Fushimi Inari Shrine (cultural — matches your interests)\n...",
  "session_id": "test-session-1"
}

Turn 2 – iterate on the plan:

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: test-session-1" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$1500","message":"Can you make it cheaper? I want to stay under $1500."}'

{
  "result": "## Revised Kyoto Itinerary: June 1-5, 2026\n\n**Budget: $1,500** (revised from $2,000)\n\n### Changes from Previous Plan\n- Switched to MH007 ($842, saving $59) — still a preferred airline\n- Downgraded to Capsule Stay ($45/night, saving $200 over 4 nights)\n- Replaced paid attractions with free alternatives\n\n### Day 1 (June 1) - Arrival\n...",
  "session_id": "test-session-1"
}

The second response references the first plan – it knows about the previous hotel choice, the original budget, and the itinerary structure. This is the conversation history at work: the planner fetched the prior turns from Redis, passed them to the LLM as a multi-turn message list, and the LLM responded with awareness of the full dialogue.

Turn 3 – ask a question:

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: test-session-1" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$1500","message":"What if I skip the flight and take the Shinkansen from Tokyo instead?"}'

The planner sees all three turns and adjusts accordingly. Each turn adds to the Redis list, and the next request reads the full history.

Part 4: Walk the trace

Open the mesh UI to view the trace:

$ meshctl start --ui -d

Navigate to http://localhost:3080 and click the most recent trace. The call tree shows the planner’s orchestration – history fetch and save happen inside the planner, not the gateway:

└─ plan_trip (planner-agent) [18542ms] ✓
   ├─ get_history (chat-history-agent) [2ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [1ms] ✓
   ├─ claude_provider (claude-provider) [18451ms] ✓
   │  ├─ flight_search (flight-agent) [14ms] ✓
   │  │  └─ get_user_prefs (user-prefs-agent) [0ms] ✓
   │  ├─ hotel_search (hotel-agent) [1ms] ✓
   │  ├─ get_weather (weather-agent) [0ms] ✓
   │  └─ search_pois (poi-agent) [21ms] ✓
   │     └─ get_weather (weather-agent) [0ms] ✓
   ├─ save_turn (chat-history-agent) [1ms] ✓
   └─ save_turn (chat-history-agent) [1ms] ✓

The flow reads top to bottom: fetch history (2ms), prefetch user preferences (1ms), run the LLM (18s, most of which is the LLM reasoning loop), save the user message (1ms), save the assistant response (1ms). The chat history calls add negligible overhead – Redis round-trips are sub-millisecond.

!!! note “Stateful concerns are just agents” Redis-backed chat history, user profiles, booking state, audit logs – they are all the same pattern: a mesh tool agent wrapping a data store. mesh does not need a special primitive for each one. The general abstraction – any MCP tool anywhere is a local function call – handles them all. Want to swap Redis for Postgres? Edit one agent. Want to add message encryption? Extend one agent. The gateway and planner do not change.

Leave it running

Your ten agents are running in watch mode. On Day 7 you will add a committee of specialists. No need to stop between chapters.

Troubleshooting

Redis connection refused. The chat-history-agent connects to Redis on localhost:6379. Make sure the observability stack is running:

$ docker compose -f docker-compose.observability.yml up -d

Check Redis is healthy:

$ docker compose -f docker-compose.observability.yml ps redis

History not persisting across calls. Verify you are sending the same X-Session-Id header in both requests. If the header is missing, the gateway generates a new UUID for each call – each turn gets its own session with no shared history. Check the session_id field in the response.

Second turn does not reference the first. Three things to check:

The chat_history dependency resolved: meshctl list should show the planner with 2/2 deps.
Redis contains the turns: redis-cli LRANGE chat:test-session-1 0 -1 should show the saved JSON.
The planner received the history: check the trace for get_history returning a non-empty list. If the planner’s max_iterations is too low, the LLM may not fully process the history before hitting the iteration cap.

ModuleNotFoundError: No module named ‘redis’. Install redis-py in your venv:

$ pip install redis

Recap

You added multi-turn chat history to the trip planner by building one new agent and updating two existing ones. The chat-history-agent wraps Redis with two tools (save_turn, get_history). The planner owns the full chat lifecycle – it fetches history before the LLM call and saves turns after. The gateway stays thin: one dependency, session ID passthrough. No framework changes, no special chat primitives – just another mesh tool agent wired through dependency injection.

Next up

Day 7 adds a committee of specialists – three LLM agents (budget analyst, adventure advisor, logistics planner) that the planner consults in parallel before producing the final itinerary.

Day 7 – Committee of Specialists

Your planner generates solid itineraries, but a single LLM perspective has blind spots. A budget-conscious traveler needs cost analysis. An adventurous one needs hidden gems. Everyone needs logistics that actually work. Today you add three specialist agents – each with its own expertise – and have the planner consult all of them before producing the final plan.

What we’re building today

graph LR
    U[User] -->|"POST /plan"| GW[gateway]
    GW -->|"trip_planning"| PL[planner-agent]
    PL ==>|tier-1| CH[chat-history-agent]
    PL -->|"+claude"| CP[claude-provider]
    PL -.->|failover| OP[openai-provider]
    PL ==>|tier-1| UPA[user-prefs-agent]
    PL ==>|fan-out| BA[budget-analyst]
    PL ==>|fan-out| AA[adventure-advisor]
    PL ==>|fan-out| LP[logistics-planner]
    BA -->|llm| CP
    AA -->|llm| CP
    LP -->|llm| CP
    CP -.->|tier-2| FA[flight-agent]
    CP -.->|tier-2| HA[hotel-agent]
    CP -.->|tier-2| WA[weather-agent]
    CP -.->|tier-2| PA[poi-agent]
    FA -->|depends on| UPA
    PA -->|depends on| WA

    style U fill:#555,color:#fff
    style GW fill:#e67e22,color:#fff
    style CH fill:#1abc9c,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style BA fill:#f39c12,color:#fff
    style AA fill:#f39c12,color:#fff
    style LP fill:#f39c12,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UPA fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff

Thirteen agents. Everything from Day 6 plus three specialists in gold. The planner generates a base itinerary, then fans out to three specialist LLM agents in parallel. Each specialist returns structured data – a Pydantic model – which the planner synthesizes into the final response.

Today has five parts:

Structured outputs – Pydantic return types on @mesh.llm agents
Build the specialists – scaffold three LLM agents with structured outputs
Update the planner – add committee dependencies and parallel fan-out
Start and test – launch 13 agents, call the planner, see enhanced results
Walk the trace – fan-out trace showing the planner calling specialists in parallel

Part 1: Structured outputs

When an @mesh.llm function returns str, the LLM’s text response passes through as-is. When it returns a Pydantic BaseModel, mesh instructs the LLM to produce JSON matching the schema and validates the response automatically. No special parameter needed – the return type annotation controls format.

Here is the budget specialist’s output model:

> *See the source code in the day's example directory.*

The BudgetAnalysis model has three fields: total_estimated (an integer), savings_tips (a list of strings), and budget_breakdown (a list of BudgetItem sub-models with per-category costs). When the LLM returns, mesh validates the response against this schema. If the LLM produces invalid JSON, mesh retries automatically.

!!! tip “Use typed models, not dict” Define typed Pydantic sub-models (like BudgetItem) instead of bare dict for list fields. Typed models produce explicit JSON schemas that work across all LLM providers – Claude, GPT, Gemini – without schema compatibility issues. If you use list[dict], some providers may reject the schema or return unpredictable field names. Typed models also give the LLM a clearer contract, producing more consistent results.

The same pattern applies to the other two specialists. Each defines its own Pydantic model with fields specific to its domain.

Part 2: Build the specialists

Budget analyst

Scaffold the agent:

$ meshctl scaffold --name budget-analyst --agent-type llm-agent --port 9110

Created agent 'budget-analyst' in budget-analyst/

Generated files:
  budget-analyst/
  ├── .dockerignore
  ├── Dockerfile
  ├── README.md
  ├── __init__.py
  ├── __main__.py
  ├── helm-values.yaml
  ├── main.py
  ├── prompts/
  │   └── budget-analyst.jinja2
  └── requirements.txt

Replace main.py with:

> *See the source code in the day's example directory.*

The function takes destination, plan_summary, and budget as input. It calls the LLM with a single prompt, and the return type BudgetAnalysis tells mesh to validate the response as structured JSON. The max_iterations=1 setting means no tool loop – the specialist makes one LLM call and returns.

Replace the prompt template at prompts/budget_analysis.j2:

> *See the source code in the day's example directory.*

Adventure advisor

Scaffold:

$ meshctl scaffold --name adventure-advisor --agent-type llm-agent --port 9111

Replace main.py:

> *See the source code in the day's example directory.*

The AdventureAdvice model returns unique_experiences (a list of Experience sub-models with name, description, and why_special), local_gems (list of strings), and off_beaten_path (a paragraph of text).

Replace the prompt at prompts/adventure_advice.j2:

> *See the source code in the day's example directory.*

Logistics planner

Scaffold:

$ meshctl scaffold --name logistics-planner --agent-type llm-agent --port 9112

Replace main.py:

> *See the source code in the day's example directory.*

The LogisticsPlan model returns daily_schedule, transit_tips, and time_optimization. Each specialist follows the same pattern: define a Pydantic model, write a Jinja prompt, return the model type from the function.

Replace the prompt at prompts/logistics_plan.j2:

> *See the source code in the day's example directory.*

Part 3: Update the planner

The planner needs two changes: declare the specialist capabilities as dependencies, and fan out to them after generating the base plan.

Add dependencies

The @mesh.tool decorator now lists four dependencies instead of one:

> *See the source code in the day's example directory.*

Mesh resolves each capability to an McpMeshTool proxy. The planner function signature gains three new parameters – budget_analyst, adventure_advisor, and logistics_planner – each injected automatically by mesh.

Fan out with asyncio.gather

After the LLM generates a base plan, the planner calls all three specialists in parallel:

> *See the source code in the day's example directory.*

Each specialist receives the destination and the base plan summary. The planner waits for all three to complete, then appends their insights to the response. Because each specialist is an independent LLM call with max_iterations=1, they run concurrently without interference.

Full updated planner

Here is the complete updated main.py:

> *See the source code in the day's example directory.*

The planner’s description changes to reflect its new role as coordinator. The core LLM call is unchanged – it still generates the base itinerary using flight, hotel, weather, and POI data. The committee adds depth without replacing the original planning logic.

Part 4: Start and test

Start the specialist agents

Your ten agents from Day 6 should still be running. Add the three specialists:

$ meshctl start --dte --debug -d -w \
    budget-analyst/main.py \
    adventure-advisor/main.py \
    logistics-planner/main.py

If you are starting fresh, launch everything at once:

$ meshctl start --dte --debug -d -w \
    budget-analyst/main.py \
    adventure-advisor/main.py \
    logistics-planner/main.py \
    claude-provider/main.py \
    openai-provider/main.py \
    flight-agent/main.py \
    hotel-agent/main.py \
    weather-agent/main.py \
    poi-agent/main.py \
    user-prefs-agent/main.py \
    chat-history-agent/main.py \
    planner-agent/main.py \
    gateway/main.py

Check the mesh:

$ meshctl list

Registry: running (http://localhost:8000) - 13 healthy

NAME                             RUNTIME   TYPE    STATUS    DEPS   ENDPOINT           AGE   LAST SEEN
adventure-advisor-7c4e2f1a       Python    Agent   healthy   0/0    10.0.0.74:9111     8s    2s
budget-analyst-5a1d3b8e          Python    Agent   healthy   0/0    10.0.0.74:9110     8s    2s
chat-history-agent-3f2a1b9c      Python    Agent   healthy   0/0    10.0.0.74:9109     20m   2s
claude-provider-0a89e8c6         Python    Agent   healthy   0/0    10.0.0.74:49486    35m   2s
flight-agent-a939da4b            Python    Agent   healthy   1/1    10.0.0.74:49480    35m   2s
gateway-7b3f2e91                 Python    API     healthy   1/1    10.0.0.74:8080     25m   2s
hotel-agent-9932ac09             Python    Agent   healthy   0/0    10.0.0.74:49482    35m   2s
logistics-planner-9f6b4d2c       Python    Agent   healthy   0/0    10.0.0.74:9112     8s    2s
openai-provider-40a5c637         Python    Agent   healthy   0/0    10.0.0.74:49485    35m   2s
planner-agent-fb07b918           Python    Agent   healthy   5/5    10.0.0.74:49484    35m   2s
poi-agent-97bd9fcc               Python    Agent   healthy   1/1    10.0.0.74:49481    35m   2s
user-prefs-agent-87506c4a        Python    Agent   healthy   0/0    10.0.0.74:49479    35m   2s
weather-agent-a6f7ea5e           Python    Agent   healthy   0/0    10.0.0.74:49483    35m   2s

Thirteen agents. The planner now shows 5/5 dependencies – user_preferences, chat_history, plus the three specialist capabilities.

List the tools:

$ meshctl list --tools

TOOL                      AGENT                            CAPABILITY           TAGS
-----------------------------------------------------------------------------------------------
adventure_advice          adventure-advisor-7c4e2f1a       adventure_advice     specialist,adventure,llm
budget_analysis           budget-analyst-5a1d3b8e          budget_analysis      specialist,budget,llm
claude_provider           claude-provider-0a89e8c6         llm                  claude
flight_search             flight-agent-a939da4b            flight_search        flights,travel
get_history               chat-history-agent-3f2a1b9c      chat_history         chat,history,state
get_user_prefs            user-prefs-agent-87506c4a        user_preferences     preferences,travel
get_weather               weather-agent-a6f7ea5e           weather_forecast     weather,travel
hotel_search              hotel-agent-9932ac09             hotel_search         hotels,travel
logistics_planning        logistics-planner-9f6b4d2c       logistics_planning   specialist,logistics,llm
openai_provider           openai-provider-40a5c637         llm                  openai,gpt
plan_trip                 planner-agent-fb07b918           trip_planning        planner,travel,llm
save_turn                 chat-history-agent-3f2a1b9c      chat_history         chat,history,state
search_pois               poi-agent-97bd9fcc               poi_search           poi,travel

13 tool(s) found

Three new specialist tools: budget_analysis, adventure_advice, and logistics_planning.

Mesh UI Topology showing thirteen agents with committee fan-out pattern

Call the planner

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: test-session-day7" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}'

The response now includes the base itinerary followed by specialist insights:

{
  "result": "## Kyoto Trip Itinerary: June 1-5, 2026\n\n**Budget: $2,000**\n\n### Day 1 (June 1) - Arrival & Eastern Kyoto\n...\n\n---\n## Specialist Insights\n\n### Budget Analysis\n{\"total_estimated\": 1847, \"savings_tips\": [\"Book flights 3 weeks in advance for 15% savings\", \"Use a Kyoto Bus Day Pass ($6/day) instead of taxis\", \"Eat at konbini (convenience stores) for 2 meals/day to save $30/day\"], \"budget_breakdown\": [{\"category\": \"flights\", \"amount\": 901}, {\"category\": \"hotels\", \"amount\": 380}, {\"category\": \"food\", \"amount\": 300}, {\"category\": \"activities\", \"amount\": 150}, {\"category\": \"transport\", \"amount\": 116}]}\n\n### Adventure Recommendations\n{\"unique_experiences\": [{\"name\": \"Fushimi Inari at dawn\", \"description\": \"Hike the thousand torii gates before 6am when the shrine is empty\", \"why_special\": \"Most tourists arrive after 9am — the early morning light through the gates is unforgettable\"}, ...], \"local_gems\": [\"Nishiki Market back alleys\", \"Philosopher's Path at sunset\", \"Tofuku-ji moss garden\"], \"off_beaten_path\": \"Skip the tourist-heavy Arashiyama bamboo grove midday. Instead, rent a bicycle and ride along the Kamo River to the northern temples...\"}\n\n### Logistics Plan\n{\"daily_schedule\": [{\"day\": 1, \"activities\": [{\"time\": \"14:00\", \"activity\": \"Arrive KIX\", \"transit\": \"Haruka Express to Kyoto Station (75 min, ¥3,430)\"}]}, ...], \"transit_tips\": [\"Buy an ICOCA card at the airport for all local transit\", \"Kyoto Bus Day Pass (¥700) covers most tourist routes\", \"Walk between eastern Higashiyama temples — they are within 15 minutes of each other\"], \"time_optimization\": \"Group attractions by neighborhood to minimize transit. Eastern Kyoto (Kiyomizu, Gion, Philosopher's Path) in one day, western Kyoto (Arashiyama, Kinkaku-ji) in another.\"}",
  "session_id": "test-session-day7"
}

The base plan covers flights, hotels, and a day-by-day itinerary. Below the separator, three specialist sections provide targeted insights: a cost breakdown with savings tips, adventure recommendations with hidden gems, and a logistics plan with transit details. Each section is structured JSON that your frontend can parse and display however you like.

Part 5: Walk the trace

Open the mesh UI:

$ meshctl start --ui -d

Navigate to http://localhost:3080 and click the most recent trace. The call tree shows the fan-out pattern:

└─ plan_trip (planner-agent) [42871ms] ✓
   ├─ get_history (chat-history-agent) [2ms] ✓
   ├─ get_user_prefs (user-prefs-agent) [1ms] ✓
   ├─ claude_provider (claude-provider) [18451ms] ✓
   │  ├─ flight_search (flight-agent) [14ms] ✓
   │  │  └─ get_user_prefs (user-prefs-agent) [0ms] ✓
   │  ├─ hotel_search (hotel-agent) [1ms] ✓
   │  ├─ get_weather (weather-agent) [0ms] ✓
   │  └─ search_pois (poi-agent) [21ms] ✓
   │     └─ get_weather (weather-agent) [0ms] ✓
   ├─ budget_analysis (budget-analyst) [8204ms] ✓    ← parallel
   ├─ adventure_advice (adventure-advisor) [7891ms] ✓ ← parallel
   ├─ logistics_planning (logistics-planner) [8102ms] ✓ ← parallel
   ├─ save_turn (chat-history-agent) [1ms] ✓
   └─ save_turn (chat-history-agent) [1ms] ✓

The planner first generates the base plan (18s via Claude with tool calls), then fans out to the three specialists in parallel (~8s each, overlapping). Total wall-clock time for the specialists is about 8 seconds, not 24 – they run concurrently via asyncio.gather. Each specialist makes its own LLM call through the shared claude-provider.

!!! note “Structured outputs are validated at the edge” Each specialist’s Pydantic model acts as a contract. If a specialist’s LLM response does not match the schema, mesh retries the call automatically. The planner receives validated data every time – no defensive parsing needed. This is especially useful when specialists are developed by different teams: the model definition is the API contract.

Stop and clean up

$ meshctl stop

On Day 8 you’ll containerize the entire mesh with Docker Compose — local agents need to stop so Docker can use the same ports.

Troubleshooting

Specialist dependency not resolved. The planner shows 3/4 or fewer deps in meshctl list. Make sure all three specialist agents started successfully:

$ meshctl list | grep -E 'budget|adventure|logistics'

If a specialist is missing, check its logs:

$ meshctl logs budget-analyst

Common cause: the prompt template file path is wrong. The file:// path in @mesh.llm is relative to the agent’s working directory. Verify the prompts/ directory exists next to main.py.

Specialist returns raw text instead of JSON. The Pydantic return type requires the LLM to produce valid JSON. If the LLM ignores the schema instruction, check that max_iterations=1 is set and the prompt explicitly asks for JSON output. Mesh retries once on validation failure, but a fundamentally broken prompt will still fail.

asyncio.gather raises an exception from one specialist. If one specialist fails, asyncio.gather raises the first exception and cancels the others. This is Python’s default behavior. For production, consider wrapping each call in a try/except or using asyncio.gather(*tasks, return_exceptions=True) to collect partial results.

Timeouts on specialist calls. Each specialist makes an LLM call. If your provider is rate-limited, three parallel calls may hit the limit. Check your API key’s rate limits. As a fallback, you can call specialists sequentially instead of with asyncio.gather.

Recap

You added a committee of three specialist agents to the trip planner. Each specialist is an independent @mesh.llm agent with a Pydantic return type for structured output. The planner declares them as dependencies, calls them in parallel with asyncio.gather, and synthesizes their insights into the final response. No framework changes needed – the same dependency injection and LLM patterns you learned on Day 3 scale to multi-agent fan-out.

Next up

Day 8 containerizes the mesh – all thirteen agents in a single Docker Compose file with health checks and log aggregation.

Day 8 – Docker Compose

Until now you have been running agents individually with meshctl start. That is great for development – watch mode, instant restarts, granular control. But for integration testing and demo environments, you want one command that brings up the entire mesh. Today you will generate a Docker Compose file from your agent code and start everything with docker compose up.

What we’re building today

graph TB
    subgraph compose["docker compose up -d"]
        direction TB
        subgraph infra["Infrastructure"]
            PG[(postgres)]
            REG[registry :8000]
            UI[mesh-ui :3080]
        end
        subgraph obs["Observability"]
            RD[(redis)]
            TM[tempo]
            GR[grafana :3000]
        end
        subgraph agents["13 Agents"]
            GW[gateway :8080]
            CH[chat-history]
            PL[planner]
            CP[claude-provider]
            OP[openai-provider]
            FA[flight-agent]
            HA[hotel-agent]
            WA[weather-agent]
            PA[poi-agent]
            UP[user-prefs]
            BA[budget-analyst]
            AA[adventure-advisor]
            LP[logistics-planner]
        end
    end

    U[User] -->|"POST /plan"| GW
    U -->|"browse"| UI

    style U fill:#555,color:#fff
    style compose fill:#1a1a2e,color:#fff,stroke:#4a9eff
    style infra fill:#2d2d44,color:#fff,stroke:#666
    style obs fill:#2d2d44,color:#fff,stroke:#666
    style agents fill:#2d2d44,color:#fff,stroke:#666
    style GW fill:#e67e22,color:#fff
    style REG fill:#1abc9c,color:#fff
    style UI fill:#1abc9c,color:#fff
    style PG fill:#336791,color:#fff
    style RD fill:#d63031,color:#fff
    style TM fill:#f39c12,color:#fff
    style GR fill:#f39c12,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style BA fill:#f39c12,color:#fff
    style AA fill:#f39c12,color:#fff
    style LP fill:#f39c12,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UP fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff
    style CH fill:#1abc9c,color:#fff

One Docker Compose file. Thirteen agents, a registry, a database, the Mesh UI dashboard, and a full observability stack. Everything starts with a single command. Everything stops with a single command.

Today has five parts:

Generate the compose file – meshctl scaffold --compose --observability
Start the containerized mesh – docker compose up -d
Verify – meshctl list, curl the gateway, check health
Mesh UI tour – agents, topology, traces at localhost:3080
Stop and clean up – docker compose down

Part 1: Generate the compose file

Stop local agents

Day 7 stopped your local agents. If any are still running:

$ meshctl stop

Copy agents to a fresh directory

Create the Day 8 working directory with all thirteen agents:

$ mkdir -p trip-planner/day-08
$ cp -r day-07/* day-08/
$ cd day-08

Run the scaffold

$ meshctl scaffold --compose --observability

Scanning for agents...
Found 12 agent(s):
  - adventure-advisor (port 9111) in adventure-advisor/
  - budget-analyst (port 9110) in budget-analyst/
  - chat-history-agent (port 9109) in chat-history-agent/
  - claude-provider (port 9106) in claude-provider/
  - flight-agent (port 9101) in flight-agent/
  - hotel-agent (port 9102) in hotel-agent/
  - logistics-planner (port 9112) in logistics-planner/
  - openai-provider (port 9108) in openai-provider/
  - planner-agent (port 9107) in planner-agent/
  - poi-agent (port 9104) in poi-agent/
  - user-prefs-agent (port 9105) in user-prefs-agent/
  - weather-agent (port 9103) in weather-agent/

Successfully generated docker-compose.yml in .

Services included:
  - postgres (5432)
  - registry (8000)
  - redis (6379)
  - tempo (3200, 4317)
  - grafana (3000)
  - adventure-advisor (9111)
  - budget-analyst (9110)
  - chat-history-agent (9109)
  - claude-provider (9106)
  - flight-agent (9101)
  - hotel-agent (9102)
  - logistics-planner (9112)
  - openai-provider (9108)
  - planner-agent (9107)
  - poi-agent (9104)
  - user-prefs-agent (9105)
  - weather-agent (9103)

The scaffold scanned every subdirectory, found @mesh.agent decorators in twelve Python files, extracted each agent’s name and port, and generated a complete docker-compose.yml with infrastructure services, health checks, and networking.

It also generated observability configuration files:

.
├── docker-compose.yml
├── tempo.yaml
└── grafana/
    ├── grafana.ini
    ├── dashboards/
    │   └── mcp-mesh-overview.json
    └── provisioning/
        ├── dashboards/dashboards.yaml
        └── datasources/datasources.yaml

What about the gateway?

The scaffold detected twelve agents, not thirteen. The gateway uses @mesh.route on a FastAPI app – it is not a @mesh.agent class. The scaffold looks for @mesh.agent decorators to auto-detect agents, so the gateway needs to be added manually.

Add the gateway service to docker-compose.yml:

> *See the source code in the day's example directory.*

Add the Mesh UI

The scaffold does not include the Mesh UI dashboard. Add it after the registry service:

> *See the source code in the day's example directory.*

Pass API keys

The LLM providers need API keys. The scaffold does not know about your environment variables, so add them to the claude-provider and openai-provider services:

# In the claude-provider service environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}

# In the openai-provider service environment:
OPENAI_API_KEY: ${OPENAI_API_KEY}

Make sure these variables are set in your shell or in a .env file next to docker-compose.yml.

!!! tip “meshctl DX” The compose file was generated from your agent code. You did not write it. When you add a new agent, re-run meshctl scaffold --compose and the compose file updates automatically. The scaffold merges new agents into the existing file without overwriting your manual additions like the gateway and API keys.

Part 2: Start the containerized mesh

Start everything

$ docker compose up -d

Docker pulls the images (first run only), starts the infrastructure, waits for health checks, and then starts all agents. The dependency ordering ensures postgres and redis are healthy before the registry starts, and the registry is healthy before agents start registering.

Check service status

$ docker compose ps

NAME                              STATUS         PORTS
trip-planner-postgres             Up (healthy)   0.0.0.0:5432->5432/tcp
trip-planner-redis                Up (healthy)   0.0.0.0:6379->6379/tcp
trip-planner-tempo                Up (healthy)   0.0.0.0:3200->3200/tcp, 0.0.0.0:4317->4317/tcp
trip-planner-grafana              Up (healthy)   0.0.0.0:3000->3000/tcp
trip-planner-registry             Up (healthy)   0.0.0.0:8000->8000/tcp
trip-planner-mesh-ui              Up (healthy)   0.0.0.0:3080->3080/tcp
trip-planner-gateway              Up (healthy)   0.0.0.0:8080->8080/tcp
trip-planner-flight-agent         Up (healthy)   0.0.0.0:9101->9101/tcp
trip-planner-hotel-agent          Up (healthy)   0.0.0.0:9102->9102/tcp
trip-planner-weather-agent        Up (healthy)   0.0.0.0:9103->9103/tcp
trip-planner-poi-agent            Up (healthy)   0.0.0.0:9104->9104/tcp
trip-planner-user-prefs-agent     Up (healthy)   0.0.0.0:9105->9105/tcp
trip-planner-claude-provider      Up (healthy)   0.0.0.0:9106->9106/tcp
trip-planner-planner-agent        Up (healthy)   0.0.0.0:9107->9107/tcp
trip-planner-openai-provider      Up (healthy)   0.0.0.0:9108->9108/tcp
trip-planner-chat-history-agent   Up (healthy)   0.0.0.0:9109->9109/tcp
trip-planner-budget-analyst       Up (healthy)   0.0.0.0:9110->9110/tcp
trip-planner-adventure-advisor    Up (healthy)   0.0.0.0:9111->9111/tcp
trip-planner-logistics-planner    Up (healthy)   0.0.0.0:9112->9112/tcp

All nineteen services running. Five infrastructure, one UI, thirteen agents.

View logs

$ docker compose logs -f --tail=20

Press ++ctrl+c++ to stop following. To view a single agent’s logs:

$ docker compose logs flight-agent

Part 3: Verify

Check agent registration

The registry is accessible at localhost:8000, the same address meshctl uses by default:

$ meshctl list

All thirteen agents should appear with their tools and dependencies resolved. The output is the same as when you ran them locally – meshctl does not know or care whether agents are running as local processes or containers.

Call the gateway

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: compose-test-1" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' \
    | python -m json.tool

The response includes the full trip plan with specialist insights – budget analysis, adventure recommendations, and logistics planning. The same functionality as Day 7, now running entirely in containers.

Verify traces

$ meshctl trace --last

The trace shows the full call tree from the gateway through the planner to all tool agents – the same distributed trace pipeline from Day 3, now flowing through containerized agents.

Part 4: Mesh UI tour

Open http://localhost:3080 in your browser.

Dashboard

The main page shows an overview of your mesh: agent count, health status, and a traffic summary table. Real-time events stream in the sidebar – you will see agent registrations from the initial startup.

Topology

Click Topology in the sidebar. The topology view renders the full agent dependency graph. Nodes represent agents, edges represent dependencies. Color coding shows agent types:

Blue: tool agents (flight, hotel, weather, poi, user-prefs)
Purple: LLM agents (planner, claude-provider, openai-provider)
Gold: specialist agents (budget-analyst, adventure-advisor, logistics-planner)
Green: utility agents (chat-history-agent)
Orange: gateway

Hover over any node for details – runtime, version, capabilities, and endpoint.

Traffic

Click Traffic to see inter-agent call metrics. The top cards show aggregate stats: total calls, success rate, token usage, and data transferred. Below that, per-edge breakdowns show every agent-to-agent route with call counts, latency, and error rates.

After making a few /plan calls, you will see traffic flowing from the gateway through the planner to the LLM providers and tool agents.

Live

Click Live for real-time trace streaming. Make another /plan call and watch the spans appear in real time – which agent called which tool, on which target, with timing and status. Each trace can be expanded to see individual spans across the mesh.

Agents

Click Agents for a table of all registered agents. Each row shows name, type, runtime, version, dependency resolution status, and last seen time. Expand any row to see its capabilities, dependencies, and recent traces.

Part 5: Stop

$ docker compose down

This stops all containers and removes the network. Data volumes persist so the next docker compose up -d starts faster. To remove volumes too:

$ docker compose down -v

Troubleshooting

Docker build fails with missing requirements. The compose file uses mcpmesh/python-runtime:3.1.0 images with a dev-mode entrypoint that installs requirements.txt on startup. If an agent has dependencies not in the base image, check that requirements.txt exists in the agent directory and lists all dependencies.

Agent cannot connect to registry. Check that the agent’s MCP_MESH_REGISTRY_URL environment variable is set to http://registry:8000 (using the Docker service hostname, not localhost). Run docker compose logs <agent-name> to see connection errors.

Port conflict on startup. If you see “port is already allocated”, another process is using that port on your host. Either stop the conflicting process or change the host port mapping in docker-compose.yml. For example, change "8000:8000" to "8001:8000" to map the registry to port 8001 on your host.

Duplicate agent ports. If any two agents share the same http_port, Docker Compose will fail to start them – they’d bind to the same host port. Check your main.py files: each agent should have a unique port. If you used --port when scaffolding (as shown in earlier chapters), you’re already set.

API keys not passed to containers. LLM providers need ANTHROPIC_API_KEY and OPENAI_API_KEY. These must be set in your shell environment or in a .env file next to docker-compose.yml:

# .env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Docker Compose automatically reads .env files.

Mesh UI not loading at localhost:3080. Verify the mesh-ui container is running: docker compose ps mesh-ui. Check its logs: docker compose logs mesh-ui. The UI needs the registry to be healthy before it starts.

Recap

You generated a Docker Compose file from your agent code with a single command. The scaffold detected twelve agents, extracted their names and ports, and produced a complete compose file with infrastructure, health checks, and observability. You added the gateway and Mesh UI manually, started everything with docker compose up -d, and verified the mesh works identically to the local setup. The Mesh UI dashboard gave you real-time visibility into agent topology, traffic, and traces.

Next up

Day 9 takes the mesh to Kubernetes with Helm charts.

Day 9 – Kubernetes

Your trip planner runs in Docker Compose. Today you deploy it to Kubernetes – the same agents, the same code, the same mesh. The only new file per agent is a Helm values file, and meshctl scaffold already created that on Day 1.

What we’re building today

graph TB
    subgraph k8s["Kubernetes — trip-planner namespace"]
        direction TB
        subgraph core["mcp-mesh-core (Helm)"]
            PG[(postgres)]
            REG[registry :8000]
            RD[(redis)]
            TM[tempo]
            GR[grafana :3000]
        end
        subgraph agents["13 Agents (Helm)"]
            GW[gateway :8080]
            CH[chat-history]
            PL[planner]
            CP[claude-provider]
            OP[openai-provider]
            FA[flight-agent]
            HA[hotel-agent]
            WA[weather-agent]
            PA[poi-agent]
            UP[user-prefs]
            BA[budget-analyst]
            AA[adventure-advisor]
            LP[logistics-planner]
        end
    end

    U[User] -->|"port-forward\nor ingress"| GW

    style U fill:#555,color:#fff
    style k8s fill:#1a1a2e,color:#fff,stroke:#4a9eff
    style core fill:#2d2d44,color:#fff,stroke:#666
    style agents fill:#2d2d44,color:#fff,stroke:#666
    style GW fill:#e67e22,color:#fff
    style REG fill:#1abc9c,color:#fff
    style PG fill:#336791,color:#fff
    style RD fill:#d63031,color:#fff
    style TM fill:#f39c12,color:#fff
    style GR fill:#f39c12,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style BA fill:#f39c12,color:#fff
    style AA fill:#f39c12,color:#fff
    style LP fill:#f39c12,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UP fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff
    style CH fill:#1abc9c,color:#fff

One namespace. Two Helm charts (mcp-mesh-core for infrastructure, mcp-mesh-agent for each agent). Thirteen agents, a registry, a database, and a full observability stack. Same agents as Day 8 – running in Kubernetes pods instead of Docker containers.

Today has five parts:

The DDDI payoff – same code, new platform
Create the namespace and secrets – one-time setup
Deploy the registry and infrastructure – helm install mcp-core
Deploy the agents – one helm install per agent
Verify – kubectl get pods, meshctl list, curl the gateway

The DDDI payoff

Open your Day 8 flight agent and your Day 9 flight agent side by side.

$ diff day-08/python/flight-agent/main.py day-09/python/flight-agent/main.py

80c80
<     description="TripPlanner flight search tool -- Day 8",
---
>     description="TripPlanner flight search tool -- Day 9",

One line changed: the description string. The flight_search function – its parameters, its return type, its stub data – is identical. The imports are identical. The decorators are identical. The function you wrote on Day 1 and evolved through Day 8 runs on Kubernetes without a single code change.

Remember that helm-values.yaml file from Day 1 that you ignored?

> *See the source code in the day's example directory.*

That is the Kubernetes deployment manifest for your flight agent. The scaffold generated it on Day 1. It tells the Helm chart which image to pull, what to name the agent, and how many resources to give it. The chart handles the rest: Deployment, Service, health probes, environment variables, service account.

No env-specific config files. No sidecars. No wrapper code. The function you wrote on Day 1 runs here.

Prerequisites

A Kubernetes cluster (minikube, kind, EKS, GKE, AKS)
kubectl configured for your cluster
Helm 3.8+ (OCI registry support)
Agent images built and available to the cluster

For minikube, use minikube’s Docker daemon so images are available locally without pushing to a registry:

$ eval $(minikube docker-env)

Part 1: Build agent images

Each agent has a Dockerfile (generated by meshctl scaffold) that uses the official mcpmesh/python-runtime base image. Build all thirteen agents:

$ cd day-09/python

$ for agent in flight-agent hotel-agent weather-agent poi-agent \
    user-prefs-agent chat-history-agent claude-provider openai-provider \
    planner-agent gateway budget-analyst adventure-advisor logistics-planner
do
  echo "Building $agent..."
  docker build -t "trip-planner/${agent}:latest" "$agent/"
done

Verify the images are available:

$ docker images --filter "reference=trip-planner/*" --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"

REPOSITORY                         TAG       SIZE
trip-planner/flight-agent          latest    409MB
trip-planner/hotel-agent           latest    409MB
trip-planner/weather-agent         latest    409MB
trip-planner/poi-agent             latest    409MB
trip-planner/user-prefs-agent      latest    409MB
trip-planner/chat-history-agent    latest    409MB
trip-planner/claude-provider       latest    409MB
trip-planner/openai-provider       latest    409MB
trip-planner/planner-agent         latest    409MB
trip-planner/gateway               latest    409MB
trip-planner/budget-analyst        latest    409MB
trip-planner/adventure-advisor     latest    409MB
trip-planner/logistics-planner     latest    409MB

!!! tip “Cloud clusters” For EKS, GKE, or AKS, push images to your container registry instead: shell docker buildx build --platform linux/amd64 \ -t your-registry/flight-agent:v1.0.0 --push flight-agent/ Then update image.repository in each values file.

Part 2: Create the namespace and secrets

$ kubectl create namespace trip-planner

namespace/trip-planner created

LLM agents need API keys. Create a Kubernetes Secret:

$ kubectl -n trip-planner create secret generic llm-keys \
    --from-literal=ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY

secret/llm-keys created

The Helm values files for LLM agents reference this secret by name:

> *See the source code in the day's example directory.*

The secretKeyRef mounts the key as an environment variable inside the pod. The agent code reads ANTHROPIC_API_KEY from the environment – the same way it did locally. No code change needed.

Part 3: Deploy the registry

The mcp-mesh-core chart deploys the registry, PostgreSQL, Redis, Tempo, and Grafana as a single Helm release:

$ helm install mcp-core oci://ghcr.io/dhyansraj/mcp-mesh/mcp-mesh-core \
    --version 3.1.0 \
    -n trip-planner \
    -f helm/values-core.yaml \
    --wait --timeout 5m

Wait for the registry to become available:

$ kubectl wait --for=condition=available \
    deployment/mcp-core-mcp-mesh-registry \
    -n trip-planner --timeout=120s

deployment.apps/mcp-core-mcp-mesh-registry condition met

Part 4: Deploy the agents

Each agent gets its own helm install using the mcp-mesh-agent chart and the values file from helm/:

$ AGENTS=(
    flight-agent hotel-agent weather-agent poi-agent user-prefs-agent
    chat-history-agent claude-provider openai-provider planner-agent
    gateway budget-analyst adventure-advisor logistics-planner
  )

$ for agent in "${AGENTS[@]}"; do
    echo "Installing $agent..."
    helm install "$agent" \
      oci://ghcr.io/dhyansraj/mcp-mesh/mcp-mesh-agent \
      --version 3.1.0 \
      -n trip-planner \
      -f "helm/values-${agent}.yaml"
  done

Installing flight-agent...
Installing hotel-agent...
Installing weather-agent...
Installing poi-agent...
Installing user-prefs-agent...
Installing chat-history-agent...
Installing claude-provider...
Installing openai-provider...
Installing planner-agent...
Installing gateway...
Installing budget-analyst...
Installing adventure-advisor...
Installing logistics-planner...

!!! tip “minikube image pull” If you built images with eval $(minikube docker-env), add --set image.pullPolicy=Never to each helm install so Kubernetes uses the local images instead of trying to pull from a registry.

Port strategy

On Day 8, each agent had a unique port (9101, 9102, …) because all containers shared the host network. In Kubernetes, each pod has its own IP address, so every agent listens on port 8080. The Helm chart sets MCP_MESH_HTTP_PORT=8080 as an environment variable, which overrides the http_port in the @mesh.agent decorator. Your code does not change.

Part 5: Verify

Check pods

$ kubectl -n trip-planner get pods

NAME                                                 READY   STATUS    AGE
adventure-advisor-mcp-mesh-agent-b5fcb5d9-tw48r      1/1     Running   30s
budget-analyst-mcp-mesh-agent-6cdfc8c5c5-bmr9d       1/1     Running   30s
chat-history-agent-mcp-mesh-agent-57b497ffc9-6dgd4   1/1     Running   30s
claude-provider-mcp-mesh-agent-55756498b9-9sndc      1/1     Running   30s
flight-agent-mcp-mesh-agent-5df865b559-jc6cx         1/1     Running   30s
gateway-mcp-mesh-agent-79cbcf7d88-wxng4              1/1     Running   30s
hotel-agent-mcp-mesh-agent-94d8f8b8-dnfh8            1/1     Running   30s
logistics-planner-mcp-mesh-agent-5db8d9555-ndjff     1/1     Running   30s
mcp-core-mcp-mesh-grafana-6d7b9f68d6-rhbqx           1/1     Running   6m
mcp-core-mcp-mesh-postgres-0                         1/1     Running   6m
mcp-core-mcp-mesh-redis-7df8848cb7-bdlqs             1/1     Running   6m
mcp-core-mcp-mesh-registry-8448c85b75-4p9h7          1/1     Running   6m
mcp-core-mcp-mesh-tempo-5d8d4cbb49-gmqpd             1/1     Running   6m
openai-provider-mcp-mesh-agent-7cfd4b55bb-stqwr      1/1     Running   30s
planner-agent-mcp-mesh-agent-54876f44f4-6cp87        1/1     Running   30s
poi-agent-mcp-mesh-agent-b7fcf4864-gmslk             1/1     Running   30s
user-prefs-agent-mcp-mesh-agent-c4746c7c8-vz5bh      1/1     Running   30s
weather-agent-mcp-mesh-agent-875b6477c-wvrkv         1/1     Running   30s

Eighteen pods: five infrastructure, thirteen agents. All 1/1 Running.

Check services

$ kubectl -n trip-planner get svc

Every agent has a ClusterIP service on port 8080. The gateway has a NodePort service so you can reach it from outside the cluster.

Check agent registration

Port-forward the registry and use meshctl list:

$ kubectl -n trip-planner port-forward svc/mcp-core-mcp-mesh-registry 8000:8000 &

$ meshctl list --registry-url http://localhost:8000

Registry: running (http://localhost:8000) - 13 healthy

NAME                        RUNTIME  TYPE    STATUS   DEPS  ENDPOINT
adventure-advisor-491aeceb  Python   Agent   healthy  0/0   adventure-advisor-mcp-mesh-agent.trip-planner:8080
budget-analyst-bbde0bf2     Python   Agent   healthy  0/0   budget-analyst-mcp-mesh-agent.trip-planner:8080
chat-history-agent-e6fe4291 Python   Agent   healthy  0/0   chat-history-agent-mcp-mesh-agent.trip-planner:8080
claude-provider-de41d665    Python   Agent   healthy  0/0   claude-provider-mcp-mesh-agent.trip-planner:8080
flight-agent-b5a0bfb6       Python   Agent   healthy  1/1   flight-agent-mcp-mesh-agent.trip-planner:8080
gateway-api-b7080b01        Python   API     healthy  1/1   gateway-mcp-mesh-agent.trip-planner:8080
hotel-agent-db0a6b18        Python   Agent   healthy  0/0   hotel-agent-mcp-mesh-agent.trip-planner:8080
logistics-planner-5fd4a0e7  Python   Agent   healthy  0/0   logistics-planner-mcp-mesh-agent.trip-planner:8080
openai-provider-b32513de    Python   Agent   healthy  0/0   openai-provider-mcp-mesh-agent.trip-planner:8080
planner-agent-9b662efc      Python   Agent   healthy  5/5   planner-agent-mcp-mesh-agent.trip-planner:8080
poi-agent-2ccdd8e5          Python   Agent   healthy  1/1   poi-agent-mcp-mesh-agent.trip-planner:8080
user-prefs-agent-3bfc1af9   Python   Agent   healthy  0/0   user-prefs-agent-mcp-mesh-agent.trip-planner:8080
weather-agent-b8c26c65      Python   Agent   healthy  0/0   weather-agent-mcp-mesh-agent.trip-planner:8080

Thirteen agents, all healthy. The planner resolves all five dependencies (5/5). The gateway resolves its single dependency (1/1). Endpoints use Kubernetes DNS names – <service>.<namespace>:<port> – which resolve automatically within the cluster.

Call the gateway

Port-forward the gateway and send a request:

$ kubectl -n trip-planner port-forward svc/gateway-mcp-mesh-agent 8080:8080 &

$ curl -s http://localhost:8080/health

{"status": "healthy"}

$ curl -s -X POST http://localhost:8080/plan \
    -H "Content-Type: application/json" \
    -H "X-Session-Id: k8s-test-1" \
    -d '{"destination":"Kyoto","dates":"June 1-5, 2026","budget":"$2000"}' \
    | python -m json.tool

The response includes the full trip plan with specialist insights – the same output you saw on Day 7 and Day 8, now served from Kubernetes pods.

Call a tool directly

You can also call individual tools through the registry, the same way you did on Day 1:

$ meshctl call flight_search \
    '{"origin":"SFO","destination":"NRT","date":"2026-06-01"}' \
    --registry-url http://localhost:8000

{
  "result": [
    {
      "carrier": "MH",
      "flight": "MH007",
      "origin": "SFO",
      "destination": "NRT",
      "date": "2026-06-01",
      "depart": "09:15",
      "arrive": "14:40",
      "price_usd": 842
    },
    {
      "carrier": "SQ",
      "flight": "SQ017",
      "origin": "SFO",
      "destination": "NRT",
      "date": "2026-06-01",
      "depart": "11:50",
      "arrive": "17:05",
      "price_usd": 901
    }
  ]
}

The same stub data. The same function. Running in a Kubernetes pod.

Optional: Ingress

Instead of port-forwarding, you can expose the gateway via Ingress. On minikube, enable the ingress addon:

$ minikube addons enable ingress

Apply the ingress manifest:

$ kubectl apply -f k8s/ingress-gateway.yaml

> *See the source code in the day's example directory.*

Add the hostname to your /etc/hosts:

$ echo "$(minikube ip) trip-planner.local" | sudo tee -a /etc/hosts

Then call the gateway via the ingress:

$ curl -s http://trip-planner.local/health

What changed from Day 8

Aspect	Day 8 (Docker Compose)	Day 9 (Kubernetes)
Agent code	Identical	Identical
Orchestrator	`docker compose up`	`helm install`
Port strategy	Unique ports (9101, 9102…)	All agents on 8080
Secrets	`.env` file	Kubernetes Secret
Networking	Docker bridge network	Kubernetes DNS
Health probes	Docker health checks	k8s liveness/readiness
Scaling	Manual (`docker compose up --scale`)	`kubectl scale` or HPA

The agent code column is the important one. It says “Identical” twice.

Clean up

$ helm uninstall gateway -n trip-planner
$ helm uninstall planner-agent -n trip-planner
$ # ... (repeat for all agents, or use the teardown script)

$ # Or use the provided teardown script:
$ ./helm/teardown.sh

The teardown script uninstalls all Helm releases and deletes the namespace:

$ ./helm/teardown.sh

=== Uninstalling agents ===
  Removed flight-agent
  Removed hotel-agent
  ...
=== Uninstalling core ===
  Removed mcp-core
=== Deleting namespace ===
namespace "trip-planner" deleted
=== Done ===

Troubleshooting

Image pull errors. On minikube, build images inside minikube’s Docker daemon (eval $(minikube docker-env)) and set image.pullPolicy=Never in the Helm install. On cloud clusters, push images to your container registry and update image.repository in the values files.

Pod in CrashLoopBackOff. Check the logs:

$ kubectl -n trip-planner logs <pod-name>

Common causes: missing secrets (the llm-keys Secret was not created), missing dependencies (Redis not ready before chat-history-agent starts), or import errors in agent code.

meshctl list shows no agents. Make sure the registry port-forward is running:

$ kubectl -n trip-planner port-forward svc/mcp-core-mcp-mesh-registry 8000:8000 &
$ meshctl list --registry-url http://localhost:8000

Gateway returns “capability unavailable”. The planner or its dependencies have not registered yet. Wait 30 seconds for all agents to complete registration, then retry.

Ingress not working. Verify the ingress controller is running:

$ minikube addons enable ingress
$ kubectl get pods -n ingress-nginx

Check the ingress resource:

$ kubectl -n trip-planner describe ingress trip-planner-gateway

Recap

You deployed all thirteen trip planner agents to Kubernetes using two Helm charts: mcp-mesh-core for infrastructure and mcp-mesh-agent for each agent. The agent code is identical to Day 8. The only new files are the Helm values files – and meshctl scaffold generated those on Day 1.

The DDDI pattern delivered on its promise: the function you wrote on Day 1 runs in Kubernetes without modification. The decorators handle registration. The Helm chart handles deployment. The registry handles discovery. Your code handles your business logic.

Next up

Day 10 wraps up the tutorial – a celebration of what you built, production readiness pointers, and open-ended challenges for where to go from here.

Day 10 – What You Built and Where to Go

Ten days ago you scaffolded a single tool agent. Today you have a 13-agent trip planner running on Kubernetes with LLM-driven planning, a committee of specialists, chat history, distributed tracing, and an HTTP API. Let’s take stock of what you built, cover a few production essentials, and look at where to go from here.

Part 1: What you built

By the numbers

Metric	Count
Agents	13 – 5 tool agents, 2 LLM providers, 1 planner, 3 specialists, 1 gateway, 1 chat history
LLM providers	2 with automatic failover (Claude + OpenAI)
Dependency patterns	Tier-1 (direct) and tier-2 (transitive)
Chat backend	Multi-turn conversations with Redis
Structured outputs	Committee aggregation via Pydantic models
Deployment targets	Docker Compose + Kubernetes with Helm
Observability	Distributed tracing via `meshctl trace`, Grafana dashboards, Tempo

The final architecture

graph TB
    subgraph k8s["Kubernetes -- trip-planner namespace"]
        direction TB
        subgraph core["mcp-mesh-core (Helm)"]
            PG[(postgres)]
            REG[registry :8000]
            RD[(redis)]
            TM[tempo]
            GR[grafana :3000]
        end
        subgraph agents["13 Agents (Helm)"]
            GW[gateway :8080]
            CH[chat-history]
            PL[planner]
            CP[claude-provider]
            OP[openai-provider]
            FA[flight-agent]
            HA[hotel-agent]
            WA[weather-agent]
            PA[poi-agent]
            UP[user-prefs]
            BA[budget-analyst]
            AA[adventure-advisor]
            LP[logistics-planner]
        end
    end

    U[User] -->|"port-forward\nor ingress"| GW

    style U fill:#555,color:#fff
    style k8s fill:#1a1a2e,color:#fff,stroke:#4a9eff
    style core fill:#2d2d44,color:#fff,stroke:#666
    style agents fill:#2d2d44,color:#fff,stroke:#666
    style GW fill:#e67e22,color:#fff
    style REG fill:#1abc9c,color:#fff
    style PG fill:#336791,color:#fff
    style RD fill:#d63031,color:#fff
    style TM fill:#f39c12,color:#fff
    style GR fill:#f39c12,color:#fff
    style PL fill:#9b59b6,color:#fff
    style CP fill:#9b59b6,color:#fff
    style OP fill:#9b59b6,color:#fff
    style BA fill:#f39c12,color:#fff
    style AA fill:#f39c12,color:#fff
    style LP fill:#f39c12,color:#fff
    style FA fill:#4a9eff,color:#fff
    style PA fill:#4a9eff,color:#fff
    style UP fill:#1a8a4a,color:#fff
    style WA fill:#1a8a4a,color:#fff
    style HA fill:#1a8a4a,color:#fff
    style CH fill:#1abc9c,color:#fff

One namespace. Two Helm charts. Thirteen agents, a registry, a database, and a full observability stack – the same Python functions you wrote on Day 1, running in Kubernetes pods.

The journey, day by day

Day	What you built	Key concept
1	`flight_search` – a single tool agent	`meshctl scaffold`, `@mesh.tool`
2	5 tool agents wired together	Dependency injection, capabilities
3	LLM planner with Jinja templates	`@mesh.llm`, observability, `meshctl trace`
4	Claude + OpenAI with automatic failover	Tag routing (`+claude`), tier-1/tier-2
5	FastAPI chat gateway	`@mesh.route`, HTTP integration
6	Redis-backed chat history	Persistent conversations, session management
7	Committee of specialists	Structured outputs, multi-agent coordination
8	Docker Compose deployment	Containerized agents, `meshctl scaffold --compose`
9	Kubernetes with Helm	Helm charts, ingress, production observability
10	You are here	Production readiness, what’s next

Every day added capability without rewriting what came before. The flight_search function from Day 1 is the same function running on Kubernetes on Day 9.

The code you didn’t write

Over ten days you focused on business logic – the trip planning domain. Here is what you never had to build:

No REST clients or HTTP handlers for inter-agent communication
No service discovery code
No environment-specific configuration files
No sidecars or proxy containers
No LLM vendor SDK imports in the planner
No serialization/deserialization code for tool calls

The flight_search function from Day 1 runs on Kubernetes unchanged. Same file, same decorators, same types. The mesh handled registration, discovery, routing, failover, and observability – your code handled flights, hotels, weather, and trip plans.

Part 2: Production readiness

TripPlanner is functional, but a production deployment needs a few more layers. Each item below is a brief pointer with a link to the full documentation – not a deep-dive.

Security

MCP Mesh provides three layers of security: registration trust (who can join the mesh), agent-to-agent mTLS (encrypted inter-agent calls), and authorization (who can do what).

Registration trust – the registry validates agent identity via TLS certificates before accepting registration. Supports file-based certs, HashiCorp Vault PKI, and SPIRE workload identity.
Agent-to-agent mTLS – every inter-agent call is mutually authenticated. The same certificate used for registration handles peer auth – no additional configuration.
Authorization – MCP Mesh propagates HTTP headers end-to-end through the mesh. Use your platform’s auth framework (FastAPI middleware, Spring Security, Express middleware) to enforce access control.
Entity management – meshctl entity register, meshctl entity list, and meshctl entity revoke control which organizational CAs are trusted.

Full details: Security documentation

Observability

The observability stack you deployed on Day 9 (Tempo + Grafana) is ready for production monitoring:

Distributed tracing – every tool call, LLM invocation, and inter-agent hop is traced. Use meshctl trace locally or Grafana’s Tempo datasource in Kubernetes.
Dashboards – Grafana ships with pre-configured views for latency, error rates, and queue depth.
Alerting – connect Grafana alerting to Slack, PagerDuty, or email for latency spikes or error rate thresholds.

Full details: Observability documentation

Resource limits

Set CPU and memory limits in your Helm values files. You already have helm-values.yaml per agent from Day 9 – add resource blocks:

agent:
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi

Health probes

Mesh agents expose health endpoints automatically (/health). The Helm chart wires liveness and readiness probes to this endpoint – no configuration needed. If an agent becomes unhealthy, Kubernetes restarts it and the registry removes it from the topology within one heartbeat cycle.

Secrets management

Day 9 used kubectl create secret for LLM API keys. For production, move to a secrets operator:

external-secrets-operator – syncs secrets from Vault, AWS Secrets Manager, or GCP Secret Manager into Kubernetes secrets.
sealed-secrets – encrypt secrets in Git, decrypt at deploy time.

Horizontal scaling

Tool agents are stateless – run multiple replicas for throughput. The mesh routes calls to any healthy instance automatically:

agent:
  replicaCount: 3

LLM providers and the planner can also scale horizontally. The chat history agent is stateless too (state lives in Redis). The gateway scales behind a Kubernetes Service or Ingress.

Part 3: Challenges

The tutorial is complete, but TripPlanner is a starting point. Here are ideas to explore on your own – each one exercises a different part of the mesh.

Add OAuth authentication to the gateway

Protect the /plan endpoint with JWT tokens. Use FastAPI’s HTTPBearer dependency to validate tokens, and configure MCP_MESH_PROPAGATE_HEADERS to forward the Authorization header through the mesh so downstream agents can see the caller’s identity. See the authorization documentation for the header propagation pattern.

Integrate RAG with a knowledge-base agent

Scaffold a new agent that retrieves destination guides from a vector store (Pinecone, Weaviate, pgvector). Inject the retrieved context into the planner’s prompt template as an additional variable. The planner already supports Jinja templates – add a {{ destination_context }} block and wire the knowledge agent as a tier-1 dependency.

Add a Gemini provider

Scaffold a third LLM provider with meshctl scaffold. Register it with capability="llm" and tags=["gemini"]. Deploy all three providers and benchmark them on the same trip query. The planner’s +claude tag routing gives Claude priority, but if you stop Claude and Gemini, traffic fails over to OpenAI – test it.

Build a price monitor

Create a scheduled agent that checks flight prices daily (expand the flight_search stub with real API calls or a richer simulation). When prices drop below a user-defined threshold, write an alert to a new price_alerts capability. Wire a notification agent that reads alerts and sends messages via email or Slack.

Swap a Python agent for TypeScript

Rewrite weather-agent in TypeScript using the TypeScript SDK. Start it alongside the Python agents. The planner doesn’t know or care what language the weather agent is written in – it discovers capabilities, not implementations. Verify everything works with meshctl call get_weather.

Add structured logging

Configure JSON logging in your agents (Python’s structlog or the standard logging module with a JSON formatter). Include the trace_id from mesh headers so log lines correlate with distributed traces. Ship logs to Grafana Loki and cross-reference with Tempo traces for full request-level observability.

Build a streaming mobile UI

Already built — see the Day 10 Bonus — Streaming UI chapter. It takes the buffered Day 9 mesh and makes the user-visible Claude response stream live, token by token, into a mobile-first React UI. Two file changes (planner + gateway) plus a single HTML file. The deepest pipeline mcp-mesh ships, end to end.

The finished product

Add a modern web UI, wire in Google authentication, and your ten days of work becomes a production-ready AI application. Not a demo. Not a prototype. A real, multi-user trip planner backed by thirteen mesh agents, specialist AI committees, multi-turn chat, automatic LLM failover, and distributed tracing – deployable to Kubernetes with a single helm install.

Google OAuth login {: .app-screen }

{: .app-screen }

AI-generated itinerary {: .app-screen }

Specialist insights {: .app-screen }

Ten days. Thirteen agents. Three LLM providers. One framework. You went from meshctl scaffold to a Kubernetes-deployed, multi-user AI application – and the flight_search function you wrote in the first hour of Day 1 is still running, unchanged, in a production pod. No rewrites. No migration layer. No “now let’s port it to the real stack.” The code you wrote is the real stack. That is what MCP Mesh was built for, and you just proved it works.

Thank you

That’s the TripPlanner tutorial. You started with a single Python function and ended with a 13-agent system running on Kubernetes – with LLM planning, committee refinement, chat history, distributed tracing, and an HTTP API. Every agent is a plain Python file. Every deployment target uses the same code. The mesh handled the infrastructure so you could focus on the domain.

If you have questions, ideas, or feedback, find us on Discord or GitHub. We’d love to see what you build.

The TripPlanner Tutorial

What you’ll have built by Day 10

The Day 10 architecture

The arc

Prerequisites

Start Day 1

Things worth noticing along the way

Prerequisites

Supported platforms

meshctl

Verify

Language runtime

Python 3.11 or later

Virtual environment

Verify

Ready to start

Day 1 — Scaffold and First Tool Agent

What we’re building today

Step 1: Scaffold the agent

Step 2: Write the tool

Step 3: Start the agent

Step 4: Start the UI

Step 5: Inspect the mesh

Step 6: Call the tool

Stop and clean up

Troubleshooting

Recap

See also

Next up

Day 2 — More Tools and Dependency Injection

What we’re building today

Step 1: Scaffold the new agents

Step 2: Write the tools

Standalone tools: hotel, weather, user-prefs

DI tools: flight-agent (updated) and poi-agent (new)

Step 3: Start all agents

Step 4: Start the UI

Step 5: Inspect the mesh

Step 6: Call a tool with dependency injection

Stop and clean up

Troubleshooting

Recap

See also

Next up

Day 3 – Observability and LLM Integration

What we’re building today

Part 1: Set up distributed tracing

Generate the compose file

Start the stack

Part 2: Register an LLM provider

Scaffold the provider

Start the provider with all Day 2 agents

Part 3: Build the planner agent

The prompt template

The planner code

Start the planner

Start the UI

Part 4: Call the planner

Part 5: Walk the trace

Leave it running

Troubleshooting

Recap

See also

Next up

Day 4 – Multiple Providers and Dependency Tiers

What we’re building today

Part 1: Add a second LLM provider

Scaffold the provider

Start the provider

Part 2: Provider tags and preference routing

Update the planner’s provider selection

Part 3: Provider swap – zero code changes

Call 1: Claude is preferred and available

Call 2: Stop Claude, watch failover

Call 3: Restart Claude, verify preference

Part 4: Connect the planner to tool agents

Tier-1: prefetch dependencies

Tier-2: LLM-discoverable tools

The two tiers together

The updated prompt template