AI AgentAI Coding

MCP 2026's New Main Thread: Drawing a Definitive Boundary at the Protocol Layer

When we use large models to handle time-consuming tasks—like organizing a large codebase or cleaning up hundreds of thousands of medical molecular data points in the background—developers often run into a headache.

You want it to keep an eye on the task in the background until it finishes. But it calls a tool to check the status, finds it’s still processing, and on the next query it might misremember the task name, or after checking just once it mistakenly assumes the task is done and hands over incomplete results.

As soon as you let an agent run a task that takes over 30 seconds, under the current development model, you’re almost guaranteed to hit this problem.

The old solution was to layer prompts on top of the model, forcing it to poll step by step through rigid rules. But this stuff is inherently stochastic. The more you use rigid rules to constrain every step of its process, the more bloated the code gets. In the end, not only does it fail to control the model—most of the developer’s time is spent cleaning up after its mistakes.

When discussing Another Kind of Safety in the AI Era, we once talked about a shift in development philosophy: in the past, a programmer’s sense of safety came from process determinism, locking down every causal branch with hard-coding. But in the AI era, rather than using rigid constraints to restrict every step of its process, it’s better to shift toward result certainty—just define the endpoint, write an auto-running acceptance script, and let it grope its way through the execution-and-correction loop.

That said, as this model landed in production, people quickly discovered its limits: going to either extreme comes with engineering side effects.

Holding every link tight with hard-coded logic, or tossing everything to the large model to watch over—neither extreme works in practical system design.

What makes architecture design valuable is not blindly chasing one extreme, but understanding this spectrum of uncertainty and finding the right position for each specific link.

The recently released Model Context Protocol 2026 Roadmap introduced the Tasks primitive specifically for handling async tasks.

This standard from the SEP-1686 proposal draws a clear line at the protocol’s underlying layer. It’s not just a protocol patch for convenient async interface calls; it re-clarifies at the protocol layer the division of labor between large models and traditional code, performing a deterministic layering for agent applications.

Leave Orchestration to Deterministic Code

The design philosophy behind the Tasks primitive is: large models should no longer handle workflow tasks like status polling, timeout calculation, and retry monitoring.

The protocol defines a state machine at the underlying level. After creation, an async task enters the created state, followed by the working state, and finally there are only three irreversible terminal states: successfully completed, declared failed, or cancelled by the user.

These lifecycle state transitions are entirely driven by deterministic code mechanisms within the host application; the large model does not participate in judging intermediate polling states.

During this time, the host application can directly call the four basic JSON-RPC methods defined by the protocol (tasks/get, tasks/result, tasks/list, tasks/cancel) to query status and fetch results. The large model doesn’t need to worry about how many seconds have elapsed in the background, doesn’t need to write loop code, and doesn’t need to deal with various temporary task identifiers.

Why place orchestration on the process-deterministic end?

Because the core characteristics of polling, timeout, and retry are stability and predictability—and that’s precisely where deterministic code shines.

If these supervisory tasks are left entirely to the large model, expecting it to deduce via natural language whether a task has ended or needs to keep waiting, the results are often unreliable.

The code migration and Deep Research cases listed in the SEP-1686 proposal are precisely about this kind of pain point.

For example, in the past with automated code migration tools, to keep the large model from dropping out during a long conversion, developers had to split the task into two tools—launch and status query—hoping to guide it through repeated polling via prompts. But the model would often hallucinate after just one or two polls, either misremembering the task name or mistakenly assuming the task was done and exiting early.

The protocol layer reclaims these orchestration responsibilities that require process determinism, letting the host application handle all polling with deterministic code at the underlying level. Once the final result is obtained, the data is then presented to the large model for processing. The large model just needs to issue a request tagged with a task marker at the start, and then it can go do other things.

Architectural comparison of deterministic layering

When a Tool’s Internals Become Another Agent System, Can This Nesting-Doll Design Really Be Coherent?

MCP standardizes orchestration-layer determinism and draws a rigorous track for the entire lifecycle, but it doesn’t restrict how tools operate internally.

Whether the tool internally runs an ordinary file read/write or executes complex Deep Research, the protocol layer imposes no restrictions.

In practice, this creates a new tension: what if the tool’s internals are also a multi-agent system (say the host invokes a Deep Research tool, which internally spawns three sub-agents to run retrieval and organization in parallel)?

In that case, the outer layer is a shell locked down by process code, while the inside wraps a kernel driven by multi-agent error-correction loops.

Can this outer-deterministic, inner-stochastic nesting-doll design really be coherent in engineering practice?

When the inner multi-agent system encounters anomalies in fuzzy reasoning and struggles within its correction loops, how do we surface diagnostic logs and control signals without breaking the outer deterministic code’s state machine?

Faced with this nesting of different certainty layers, the current protocol specification offers no answers—but this is precisely the unavoidable challenge for future complex agent architectures.

The Protocol Layer Doesn’t Make Global Decisions; the Autonomy of Positioning Should Be Left to Specific Scenarios

The tool invocation chains of large models are endlessly varied. Which tools should use async polling, and which should use plain synchronous calls?

MCP doesn’t impose a one-size-fits-all decision on developers. Instead, through capability negotiation, it provides a three-state mechanism called execution.taskSupport:

The design philosophy behind this mechanism: the protocol layer doesn’t make decisions for developers—it only provides a universal communication framework and state machine.

As to where each tool should fall on the spectrum, the choice is delegated directly to the specific tool developers.

For instance, a molecular property analysis tool might return results in three seconds under light testing—the developer could set it to optional or even leave it as straight sync. But once it enters production and needs to process hundreds of thousands of data points, the developer can dynamically declare it as required based on actual time consumption, forcing the system into the async state machine.

Natural Extension and Current Productization Status

Compared to the division of orchestration and negotiation, the other details of the Tasks protocol are fairly conventional.

For example, the input_required state in the state machine. When a task is halfway through and needs human intervention (such as entering a two-factor authentication code or confirming temporary authorization for a sensitive operation), the protocol layer provides a standard interface to surface the pending request to the host application to solicit data from the user, then return to the working state to continue after receiving it.

This isn’t really a new invention—it’s just the natural incorporation into the specification of practices already widespread in mainstream development tools.

Similarly, letting the large model only be responsible for initiating tasks and understanding final results, without meddling in the intermediate process, is also a rational division of the large model’s reasoning boundaries.

However, we also need to face the current limitations objectively.

Although the SEP-1686 proposal has officially adopted Tasks and written it into the MCP 2025-11-25 Specification, this mechanism still carries an experimental label. Restart semantics after task failure, result lifecycle cleanup strategies, and push mechanisms that don’t rely on polling (there’s currently no standard webhook, only SSE) have yet to fully take shape in the ecosystem.

As for the higher-level Agent Graphs, in the current roadmap it’s more of a directional declaration. The vast majority of production systems involving cross-agent complex collaboration still heavily rely on framework code hand-crafted by developers.

On the security front, a spate of security incidents in 2025 (such as malicious GitHub issue injection causing models to read and exfiltrate code without physical isolation) forced the specification to rapidly add enterprise-grade authentication like OAuth 2.1 by year’s end, using fine-grained temporary authorization and namespace isolation to lock agents into a sandbox.

At the current stage, developers don’t need to rush and tear down smoothly running systems to rewrite them with the new protocol.

That said, scanning the underlying specification’s evolution and proposal stream every quarter, treating it as a long-term signal for observing agent evolution trends, is indeed worthwhile.

Architectural choices on the certainty spectrum

Before Writing Prompts, First Separate Reasoning from Orchestration

Behind the evolution of underlying technology lies a practical architectural principle: when building agent systems and letting AI handle long-running async tasks, you should first map out all the steps in the business chain and clearly distinguish reasoning from orchestration.

First, pure reasoning. Any link that requires understanding ambiguous context, quality control via natural language, or flexible decision-making based on business scenarios should be left to the large model. In this part, developers need to accept process uncertainty and design verifiable acceptance criteria at the outer layer, rather than constraining model reasoning with hard-coding. Leverage the decreasing cost of tokens to let the model grope within the execution-and-correction loop, ultimately producing a deterministic result.

Second, pure orchestration. Any mechanical process steps like status polling, retry monitoring, timeout calculation, and task cancellation should be stripped from prompts and returned to deterministic code logic implemented in the host application. In these links, pursue process determinism and lock every causal branch with hard-coding.

Entrusting orchestration work to the large model, expecting prompts to make it act as a state machine, is a common architectural mistake in current agent systems. Clarifying the boundary between the two—having deterministic code handle process orchestration and the large model handle content reasoning—is the foundation for building stable agent systems.