Stop Calling Tools, Start Writing Code (Mode)

MCP servers scale in a way that punishes success.

A server with ten tools works beautifully. The LLM sees all ten schemas, picks the right one, calls it. A server with two hundred tools dumps two hundred schemas into the context window before the LLM reads a single word of the user’s request: tens of thousands of tokens, most of them irrelevant.

The execution model compounds the problem. Every tool call is a round-trip. The LLM calls a tool, the result passes back through the context window, the LLM reasons about it, calls another tool. Intermediate results that only exist to feed the next step burn tokens flowing through the model on every turn.

The code mode pattern, introduced by Cloudflare and explored by Anthropic, addresses both problems at once: instead of calling tools one at a time, the LLM writes a script that composes them. Search for what’s available, write code, execute it in a sandbox. The intermediate results stay inside the sandbox. The context window stays clean. Cloudflare recently shipped a server-side implementation for their own API: two tools covering 2,500 endpoints in roughly 1,000 tokens.

FastMCP 3.1 ships server-side code mode with fully configurable discovery, and the server-side part matters more than it sounds.

CodeMode

Here’s a normal FastMCP server with CodeMode applied:

from fastmcp import FastMCP
from fastmcp.experimental.transforms.code_mode import CodeMode

mcp = FastMCP("Server", transforms=[CodeMode()])

@mcp.tool
def add(x: int, y: int) -> int:
    """Add two numbers."""
    return x + y

@mcp.tool
def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y

The only difference from a standard server is transforms=[CodeMode()]. The tool functions stay the same. But clients connecting to this server no longer see add and multiply directly; they see the meta-tools that CodeMode provides: tools for discovering what’s available and for writing code that calls them.

The default flow has three stages. Granted, three stages might sound like a lot for something intended to reduce server round-trips. The original code mode pattern, introduced by Cloudflare, had no discovery phase at all: clients loaded every tool definition into context, then executed code against them. This solved the sequential calling problem but not the context bloat problem. Anthropic introduced a two-stage approach: search for relevant tools, then execute. This addressed both problems.

For servers complex enough to need code mode, we’ve found that an additional stage makes a meaningful difference. Separating search from schema retrieval lets the search tool stay lightweight, returning only names and brief descriptions, while a dedicated schema step provides the precision the LLM needs to write correct code. But if you want something else, FastMCP permits full customization of this flow to have as few or as many stages as you need.

Here’s how the three default stages play out with the server above:

First, the LLM searches. It calls search(query="math numbers") and gets back tool names and descriptions: a lightweight index. Instead of loading two hundred schemas, it sees a few lines of text about the tools that match.

Next, it requests parameter details for the tools it found. get_schema(tools=["add", "multiply"]) returns parameter names, types, and required markers. Not the full JSON schema (by default), but enough to write code against.

Finally, it writes a Python script and executes it in a sandbox:

a = await call_tool("add", {"x": 3, "y": 4})
b = await call_tool("multiply", {"x": a, "y": 2})
return b

Three round-trips: search, schema, execute. The intermediate result (a) never enters the context window. call_tool is the only function available inside the sandbox; no filesystem, no network, just tool calls and Python.

Discovery

The three-stage flow is the default. CodeMode’s discovery surface is fully configurable, because different tool catalogs need different approaches.

CodeMode ships four discovery tools. All of them share a tunable detail level that controls how much information each response includes:

Level	Output	Token cost
`"brief"`	Tool names and one-line descriptions	Cheapest
`"detailed"`	Compact markdown with parameter names, types, and required markers	Medium
`"full"`	Complete JSON Schema	Most expensive

This is significant. Even ListTools, which dumps the entire catalog, can produce substantially fewer tokens than a standard MCP handshake when set to "brief" or "detailed". A standard tools/list response includes the full JSON Schema for every tool: argument names, types, nested objects, descriptions, constraints. ListTools at "brief" returns just names and descriptions. The context dump tax is still there, but it’s a fraction of what it would be, and the sequential calling tax is eliminated entirely because tool calls happen inside the sandbox.

By default, two discovery tools are enabled:

Search finds tools by natural-language query using BM25 ranking. Defaults to "brief" detail. The LLM can override the detail level per call, requesting "detailed" for inline schemas or "full" for the complete JSON Schema.

GetSchemas takes a list of tool names and returns parameter details. Defaults to "detailed". The fallback for when search results aren’t enough to write code against.

Two more are opt-in:

ListTools dumps the entire catalog. At "brief" detail, this is a lightweight alternative to standard MCP tool listing. For small servers, under twenty tools or so, seeing everything upfront can be faster than searching.

GetTags lets the LLM browse tools by tag metadata, then pass tags into Search to narrow results. Useful when tools have a natural taxonomy.

The discovery configuration is where the server author’s knowledge becomes design. A large platform server might use all four tools with progressive detail levels: tags for orientation, search for narrowing, schemas for precision. A smaller server can collapse to two stages by bumping search detail:

from fastmcp.experimental.transforms.code_mode import CodeMode, Search, GetSchemas

code_mode = CodeMode(
    discovery_tools=[Search(default_detail="detailed"), GetSchemas()],
)

Now search returns parameter schemas inline, and the LLM goes straight from search to execute. GetSchemas stays available as a fallback for complex parameter trees.

This two-stage configuration is exactly the pattern Cloudflare shipped for their API: search returns enough detail to write code, execute runs it. In FastMCP, it’s one line applied to any server. Cloudflare’s results — and early usage patterns — suggest two-stage may be the better default for most servers. It’s something we’re actively evaluating.

A very simple server can skip discovery entirely and bake tool instructions into the execute tool’s description:

code_mode = CodeMode(
    discovery_tools=[],
    execute_description=(
        "Available tools:\n"
        "- add(x: int, y: int) -> int: Add two numbers\n"
        "- multiply(x: int, y: int) -> int: Multiply two numbers\n\n"
        "Write Python using `await call_tool(name, params)` and `return` the result."
    ),
)

Each of these patterns is a conscious choice about the tradeoff between token cost and discovery accuracy. The server author makes that choice once, and every client benefits. This is the fundamental advantage of server-side code mode: the person who knows the tools best is the one deciding how they’re discovered and composed.

Composition

In the FastMCP 3.0 architecture, components flow through a pipeline. Providers source them; transforms modify them on the way to clients. A transform can rename, filter, namespace, or reshape what a provider exposes, and transforms compose: stack them, and each one processes the output of the previous.

CodeMode is a transform. It works with everything else in the system without special-casing.

Apply it to an entire server, or to just one provider. Some tools go through code mode, others stay directly accessible. Chain it with other transforms: add a namespace to a mounted sub-server, then apply CodeMode to the result. Filter tools by tag or version, then wrap whatever passes through.

One pattern worth highlighting is to proxy a remote server, then apply CodeMode:

from fastmcp.server import create_proxy
from fastmcp.experimental.transforms.code_mode import CodeMode

remote = create_proxy("https://api.example.com/mcp")
remote.add_transform(CodeMode())
remote.run()

That remote server now has a code execution interface with tunable discovery. The original authors didn’t build one. The person running the proxy configured one that fits their application.

The behavior falls out of the architecture.

Coming soon: We’re adding configurable code mode for every server hosted on Prefect Horizon. No code changes required.

The Sandbox

The Python execution environment is sandboxed via Pydantic’s Monty project, an experimental Python sandbox that restricts LLM-generated code to call_tool and standard Python. No filesystem access, no network access, nothing outside the sandbox boundary.

Building a Python sandbox that’s secure enough for production and flexible enough to be useful is genuinely hard. The Pydantic team has been doing excellent work on Monty, and CodeMode wouldn’t exist without it.

Resource limits are configurable: timeouts, memory caps, recursion depth.

from fastmcp.experimental.transforms.code_mode import CodeMode, MontySandboxProvider

sandbox = MontySandboxProvider(
    limits={"max_duration_secs": 10, "max_memory": 50_000_000},
)

mcp = FastMCP("Server", transforms=[CodeMode(sandbox_provider=sandbox)])

The sandbox provider itself is replaceable. Implement the SandboxProvider protocol and point CodeMode at a Docker container, a remote execution service, whatever fits the deployment.

Getting Started

pip install "fastmcp[code-mode]"

CodeMode is experimental. The core interface is stable, but the specific discovery tools and their parameters may evolve as we learn more about what works in practice.

Documentation · GitHub

Happy (context) engineering!

Comments

Join the conversation by posting on social media.

Open_Resolution_1969 • Mar 4 • Reddit

I guess I know what I'm going to fiddle around this next weekend 😅

wind_dude • Mar 4 • Reddit

It’s interesting… do the clients need to support “code mode” I’m noticing that’s the draw back clients aren’t supporting much of the mcp specs other than tools.

Dipseth • Mar 4 • Reddit

Nope its just text, any client should be able to support it.

wind_dude • Mar 4 • Reddit

So if I’m understanding correctly the client llm writes the code in a sandbox on the mcp server, and the biggest advantage is when it needs to chain tools from the same mcp server together

jlowin123 Author • Mar 5 • Reddit

Good question - no, clients don't need to support anything more than "normal" MCP tools to use this. Though I should note that in my tests, a Sonnet 4.6 class model was able to use code mode with a complex server, but a Haiku 4.5 class model made a few errors before finally using it correctly.

Chronicle112 • Mar 4 • Reddit

This is awesome, we had this on our backlog for our agent platform.

I do wonder now, are you also looking into a Client-Side Codemode? That would be interesting for scenarios where you control the client but not the MCP server.

jlowin123 Author • Mar 5 • Reddit

We could add it to the FastMCP client but FastMCP is more popular as a server framework than a client framework, as most people will use the clients bundled into their agents!

Chronicle112 • Mar 5 • Reddit

Out of curiosity, would this even be so difficult to implement? To me it sounds like the hard part is over if you have the transformation and the sandboxed environment, the client would receive the tool listing anyway no? :D Or maybe it's just me looking at it in a too simple way hehe

krychu • Mar 5 • Reddit

Looks great thanks. Do you have any plans / thoughts on exposing functions of multiple MCP servers through a single execute tool? In other words, enable LLM to write a single snippet of code that mixes functions from different MCP servers?

jlowin123 Author • Mar 5 • Reddit

Sure, first compose the servers in FastMCP, then add CodeMode to the outer one. https://gofastmcp.com/servers/composition

GoldClock9261 • Mar 5 • Reddit

This is the direction I keep ending up in too: let the model write one coherent program against a small, stable surface instead of juggling 200 tools and partial state.

The big win with your CodeMode() thing, at least in my experience, is not just token savings but fewer “half-plans” where the model commits to a tool too early. Having a discovery phase plus a single code-run lets it reason, comment its own plan, and then hit the minimal set of calls.

If you haven’t already, I’d stress-test it against messy, legacy backends. Stuff like: one REST API, one SOAP-ish thing, one DB behind a gateway, all with slightly different auth. I’ve used Kong and Tyk as the outer gateway, and DreamFactory as the data gateway so the code only ever sees clean REST instead of weird SQL or vendor APIs; works well when you want MCP to feel “flat” even though the backend is chaos.

No_More_Fail • Mar 5 • Reddit

Yes..it is awesome. My token count decreased from 50k to 2-3k max... I have posted here details https://www.reddit.com/r/mcpweb/s/LY88juamR0

jlowin123 Author • Mar 5 • Reddit

That's so great to see!

Sharp_Cauliflower476 • Mar 5 • Reddit

u/jlowin123 Used code mode on the Amazon Ads MCP:
https://github.com/KuudoAI/amazon_ads_mcp

The just a few packages, like campaign manager is a beast. Just two tools blow the budget:

* cm_CreateAd — 9,475 tokens (giant ad creative schema)

* cm_UpdateAd — 9,310 tokens

The other top 5 alone eat 22,938 tokens

Code mode economics are fantastic. Initial context: 315 tokens (3 meta-tools). Then per-workflow:

* search_tools("create campaign") — ~100 tokens round-trip

* get_schemas(["cm_CreateCampaign"]) — ~300 tokens (only the tools you need)

* execute(code) — ~200 tokens

Total per workflow: ~600 tokens vs 34K upfront.

jlowin123 Author • Mar 6 • Reddit

Awesome! Thanks for sharing the token stats too

mycall • Mar 5 • Reddit

Should we do both? I do prefer code approach in trust/sandbox environments though.

Ok_Transition_5779 • Mar 10 • Reddit

Given that code mode executes arbitrary Python, how do you prevent malicious MCP servers from escaping the sandbox or accessing sensitive system resources? What security boundaries are in place?

EternallyTrapped • Mar 5 • Reddit

Ideally, this should be on client side.

ThisGuyCrohns • Mar 5 • Reddit

Don’t need an mcp for coding.

Stop Calling Tools, Start Writing Code (Mode)

CodeMode

Discovery

Composition

The Sandbox

Getting Started

Subscribe

Comments