I think SEP-1577 is the sleeper hit of the new Model Context Protocol (MCP) specification.
Hidden behind a dry title (“Sampling with Tools”) is a feature that enables a complete architectural inversion of how we build and deploy AI agents.
“Sampling” is the mechanism by which an MCP server asks the client’s LLM to generate text (e.g., “Hey Claude, summarize this data”). When I started building FastMCP 2.0 back in April 2025, this was the feature that excited me the most — and here’s proof!
But as far as I can tell, it has a grand total of approximately one power user: FastMCP maintainer Bill Easton.
(Edit: an hour after posting this, I found the other power user. Unsurprisingly, it’s Angie Jones, who just shared a fantastic blog post on MCP sampling.)
While the rest of us were building standard tools, Bill was pushing sampling as far as it could go. He hacked together tool calling, structured results, and agentic loops on top of the previous, very limited version of the protocol. He saw the potential before the spec even supported it.
Now, with SEP-1577, the official spec has caught up to Bill’s vision. And the more time I spend with it, the more mind-bending I find it. It looks and feels exactly like every agent framework I’ve ever used, but the deployment model is completely backwards.
How We Build Agents Today
To understand the shift, look at how we build agents today.
In frameworks like Pydantic AI, LangChain, or Prefect’s very own Marvin, the “Agent” is a capital-C-Client. It is a Python script running on your machine. It holds the state, the system prompt, and the loop that decides which steps to take next.
In this model, the Server provides remote LLM completion functionality. It does not run custom code, dictate logic, or even hold state. The Client orchestrates all activity.
This works, but it has a massive distribution problem. If I want to share my “Code Janitor Agent” with you, I have to send you a repository. You have to install Python, manage dependencies, set up environment variables, and run the script. The “Agency” is locked inside my local environment.
Flip It
SEP-1577 flips this stack upside down.
It allows an MCP Server to define a sampling request that includes tools. The Server can now say to the Client:
“Here’s a goal, and here are the tools you need to achieve it. You provide the raw intelligence, but I’ll control the flow.”
The Server holds the prompt. The Server holds the workflow logic. The Server holds the tools. It can change them at any time.
When you connect a generic client—like Claude Desktop, Cursor, or a simple IDE plugin—to this server, the client doesn’t need to know anything about the agent’s logic. It just acts as the compute engine. The Server effectively “borrows” the Client’s LLM to drive its own internal agent.
But Text Is Useless
There’s a problem with raw sampling: it returns natural language text.
MCP servers are programmatic. They need to parse, validate, and act on data. Getting back “The temperature is about 72 degrees and it’s partly cloudy” is almost useless as a building block—you’d have to parse the text all over again just to extract the values.
FastMCP solves this by layering structured output on top of SEP-1577’s sampling primitives:
from pydantic import BaseModel
class Weather(BaseModel): temperature: float conditions: str
@mcp.toolasync def get_weather(city: str, ctx: Context) -> Weather: result = await ctx.sample( f"What is the current weather in {city}?", result_type=Weather, ) return result.resultThe server borrows the client’s LLM, but gets back typed, validated data it can actually use. No parsing. No hoping the format is right. Just a Weather object.
This alone makes sampling practical. But the real power comes when you add tools.
Now Add Tools
Layer in tools, and things get interesting. We are adding first-class support for this in FastMCP, and what’s wild is how familiar the code looks. You write what looks like a standard client-side agent loop, but you deploy it as a server-side tool.
Here is what it looks like to build a research agent that uses structured output and tools, running entirely on the server:
from fastmcp import FastMCP, Contextfrom fastmcp.server.sampling import sampling_toolfrom pydantic import BaseModel
mcp = FastMCP("Research Agent")
# Define the output schemaclass ResearchReport(BaseModel): summary: str sources: list[str]
# Define a helper tool for the agent@sampling_tooldef search_web(query: str) -> str: """Search the web for information.""" return f"Results for: {query}"
@mcp.toolasync def generate_report(topic: str, ctx: Context) -> ResearchReport: """A tool that acts as an autonomous research agent."""
# The server orchestrates the loop! result = await ctx.sample( messages=[f"Research {topic} and summarize."], tools=[search_web], result_type=ResearchReport, max_iterations=5, )
return result.resultIf you’ve used Pydantic AI or Marvin, this pattern—passing tools and a result type to an LLM—is second nature.
The difference is that this isn’t a script. It’s a tool on an MCP server.
Because of this, I don’t need to ship you a Python environment to run this agent. I just give you the server connection. You connect Claude Desktop to it, ask “Generate a report on FastMCP,” and your Claude instance instantly knows how to perform the research, call the web search tool, loop 5 times, and return the structured report.
Universal Clients
We are moving from a world of “Thick Clients” to “Universal Clients.”
This solves the distribution problem for complex agentic workflows. You can wrap sophisticated logic—loops, chains of thought, structured validation—inside a standard MCP server. Any client that connects instantly “becomes” that agent.
It is effectively “Write Once, Run Anywhere” for AI agents.
We are shipping support for this in FastMCP as soon as the upstream SDK creates a stable foundation for it. Until then, keep an eye on the repo… and maybe send Bill a thank you note.