In July 2026, MCP released a release candidate for the new protocol specification, shifting the protocol core from stateful back to stateless. In simple terms: previously, calling a tool required a handshake to get a session handle, and all subsequent requests had to carry it. Now that is gone. Each request carries all its own information, and any server instance can handle it. In the announcement, the official statement says the new version can scale horizontally on ordinary HTTP infrastructure without the server needing to maintain connection state.
At the opposite end is OpenAI. Ever since launching the Responses API alongside GPT-5 in March 2025, OpenAI has been steadily moving toward statefulness. The new interface lets the server retain the model’s reasoning process and conversation history, so developers no longer need to pass the full context themselves every time. In August 2025, OpenAI announced the Assistants API would be deprecated in August 2026, guiding everyone to migrate to Responses. In December 2025, the Codex team notified custom providers to migrate their interfaces to the Responses format. The official blog writes: “Just as Chat Completions replaced Completions, we expect Responses to become the primary way developers build applications with OpenAI models.”
One removes state, one adds state. Opposite directions.
MCP changed its state management three times in a year and a half, each time pushed by external adoption pressure.
The first phase began on November 5, 2024. When MCP was born, it only supported stdio, and the protocol had no explicit session management. The client and server had to run on the same machine, in a single process. In a lab setting, this design was fine: a researcher runs one experiment, opens one process, closes it when done. State didn’t need managing. Today, the official MCP specification page labels this version as Legacy.
The second phase began on March 18, 2025. MCP introduced Streamable HTTP to support remote deployment. Moving from a single process to multiple requests that might hit different instances required some form of continuity. The designers chose the most familiar solution: the client sends a handshake request to get a session handle, and all subsequent requests carry it. This is the same approach used in the SOAP era before the REST revolution — works fine on a single instance, breaks down under multi-instance horizontal scaling.
The engineering foresight gap is most apparent at this step. The design assumptions from the stdio era were single-machine, single-process. When HTTP arrived, they simply grafted on sessions — the most familiar mechanism for remote continuity. It didn’t account for load-balanced multi-instance state sharing, nor did it leave room for gateways to route without deep packet inspection. Some in the community pointed out early on that requiring the client and server to be co-located was odd, and Anthropic mentioned on the roadmap that it might change in 2025. But for a long time there was no answer on how, to what, or how migration would work.
By the stable release on June 18, 2025, MCP still used session-based transport. The community began deploying remote MCP servers, and the session design collided with scalability. Developer Sai Nitesh Palamakula, in his reproduction experiment, documented a real failure: two MCP server instances sat behind a round-robin load balancer. A client’s handshake request landed on instance A, which created a session in memory and returned a handle. The SDK’s subsequent long-lived connection request was then hashed to instance B, which had no local session and returned 404. The client hung. The ops fix was either to turn on sticky sessions, binding each client to a single instance and destroying horizontal scaling, or to introduce Redis for shared session storage, adding complexity and latency to the pipeline. If the gateway wanted intelligent routing, it had to deep-parse request bodies to extract session handles, driving up CPU consumption.
On July 28, 2026, the release candidate removed the handshake and session handle, shifting the protocol core back to stateless. Protocol version, client identity, capability descriptions — all now packed into each request, making requests self-contained so any server instance can handle them. The gateway can route without unpacking request bodies, tool lists become cacheable, and distributed tracing can cross SDK boundaries. This was the result of six proposals changed together — not a single-point patch.
MCP’s state management was never the result of long-term planning. The stdio era didn’t consider remote. The HTTP era defaulted to sessions. Only when horizontal scaling broke did they go stateless. Every step was patching yesterday’s hole.
Figure: MCP state management in three stages over a year and a half. Left to right: stdio implicit state (process as session) → Streamable HTTP introduces explicit session handle → stable version retains it, community hits the scaling wall → RC drops sessions, goes stateless. Every step pushed by adoption pressure.
MCP’s reactive evolution is directly tied to its origins.
MCP was initially designed by Anthropic’s AI scientists in a lab. Its purpose was to let researchers rapidly iterate on agentic AI experiments. The core assumption: all information exchange flows through the model’s context window, and the model has full visibility into tool calls and results. This is stateless at the semantic layer — state is managed by the model through the context window, and the protocol itself records no history.
But when HTTP was added at the transport layer, state crept in. Operations like the server pushing messages to the client or mid-stream user input requests require persistent connections. The mismatch between a stateless semantic layer and a stateful transport layer is the root of all the production deployment headaches that followed.
The 2026-07-28 release candidate fixes this mismatch: it removes the session handle, makes the transport layer stateless, and uses explicit handles to maintain continuity. A tool returns a cart ID; the model passes it back as a regular parameter on the next call. State becomes an identifier that the model can see and reason about. This is the same approach HTTP APIs use to maintain state through resource identifiers — the path REST took decades ago.
This actually helps agent reasoning. Under the session design, state was hidden in transport-layer metadata, invisible to the model. Switching to handles means the model can compose across tools, pass state between steps, and reason about it. For example, the model can create two shopping carts simultaneously, receive two handles, compare their item lists in its reasoning, or move items from one to the other. This kind of cross-tool orchestration was impossible under transport-layer sessions. The official announcement notes that this approach of externalizing state is not just a replacement for session state — it is often more powerful in practice.
The cost is that tool authors must now manage handle scope, validation, and expiration themselves. Don’t let an identifier become an unrestricted skeleton key.
OpenAI’s API was designed for production from day one. Its evolution took the opposite path from MCP.
Chat Completions is stateless — the client maintains its own history and sends the full conversation to the server with each request. This interface dominated the entire AI industry. DeepSeek, Qwen, MiniMax, Moonshot, GLM, Gemini OpenAI-compatible, OpenRouter — almost every provider exposes a Chat Completions endpoint. It became the de facto standard. For developers, statelessness means near-zero switching cost: change a model ID and you’re done. No friction between providers.
The Responses API, launched alongside GPT-5 in March 2025, is optionally stateful. The default behavior is still stateless, like before, but developers can opt to let the server retain the reasoning process and pick up from it on the next turn, without retransmitting the full context every time. Later, the Conversations API was added, absorbing conversation history persistence into the server as well. OpenAI’s official position is that retaining reasoning process across turns yields a +5% TAUBench improvement on benchmarks — a genuine technical value, not pure marketing.
But the Responses API didn’t just add reasoning-state persistence. It also absorbed hosted tools: web search, file retrieval, code execution, computer use, and MCP all run on OpenAI’s servers, with the client no longer running them locally. This means the core agent loop moved from the client side into the vendor’s infrastructure.
OpenAI writes in its official blog: “Just as Chat Completions replaced Completions, we expect Responses to become the primary way developers build applications with OpenAI models.” In August 2025, it announced the Assistants API would be deprecated in August 2026. In December 2025, the Codex team notified custom providers to migrate their interfaces to the Responses format.
This is not technical regression. It is platform strategy. When stateless, the API is a commodity — the Chat Completions format is highly unified, and developers can swap the underlying model at any time. Statefulness builds a moat: reasoning state lives on the server, hosted tools run on the server, conversation history stays on the server. The deeper you use it, the higher the migration cost. The lock-in logic of agent platforms applies here too: once system prompts, skill configurations, tool connections, and harness configurations are built on a platform, the migration effort far exceeds swapping an API key. The Responses API pushes this lock-in from the configuration layer into the runtime state layer.
OpenAI pushes stateful Responses with full force. The community largely stays on stateless Chat Completions.
First, performance regression. Multiple posts on the OpenAI developer forum show measured data: the Responses API is 2-3x slower than Chat Completions, and referencing a prior response can push latency past 10 seconds, with extreme cases reaching 2-10 minutes. OpenAI engineer Steve Coffey himself suggested turning off server-side storage to reduce latency — effectively reverting to stateless mode.
Second, the token economics didn’t materialize. Community testing found that when referencing a prior response, the input token count was nearly identical to Chat Completions — the server still has to run through the full history internally. The claim of “not needing to pass the full context” only saves network transmission, not inference cost.
Third, the open-source community explicitly opposes pulling the agent loop into vendor infrastructure. Hugging Face, in its Open Responses blog, wrote directly: “Chat Completion format is still the de facto standard despite the alternatives,” and described the Responses format as “closed and not as widely adopted.” In the comments, someone wrote: “Agent loops are supposed to be implemented in the agent system not in the LLM vendors system,” objecting to vendors absorbing the agent loop into their own systems.
Figure: Side-by-side comparison. Left: OpenAI moves from stateless Chat Completions toward the stateful Responses API (reasoning state, conversation history, and hosted tools absorbed into the server), while the community stays on the stateless side. Right: MCP moves from implicit state via stdio to explicit sessions and then to stateless, with the community following. Two paths, opposite directions.
MCP and OpenAI chose opposite paths on state management. Behind that choice lies a difference in incentive structures.
OpenAI is a vendor. Statefulness creates lock-in. Lock-in sustains rent collection. Rent collection funds model R&D. MCP is an open standard under the Linux Foundation, with no rent-collection motive. Its goal is survival — interoperability, production-readiness. Statelessness is good for the ecosystem; statefulness is good for the vendor.
This is the same dynamic that played out in the evolution of web protocols. SOAP was a vendor-driven stateful protocol with WS-Session, session replication, sticky routing, and deep packet inspection at the gateway — every one of those becomes an ops cost and failure source the moment you try to scale horizontally. REST was the community-driven stateless counter-move: externalize state as explicit resource paths, and horizontal scaling for internet applications becomes genuinely viable. REST won. MCP is walking the same path, just over a decade late.
But the two paths also reveal a gap in engineering foresight between the two camps.
MCP’s statelessness was not designed in from the start. The stdio era didn’t consider remote. The HTTP era defaulted to sessions. Only when horizontal scaling broke did they go stateless. Every step was patching yesterday’s hole, not laying the foundation for tomorrow.
OpenAI’s push toward statefulness is an active platform strategy. From day one, it has been thinking about how to absorb the reasoning process and agent loop into its own infrastructure, using interface deprecation to drive standard replacement.
One is pushed by circumstances. The other actively shapes them. This is not a matter of technical taste. It is a matter of organizational positioning: MCP is research legacy passively converted to engineering; OpenAI is commercial drive doing platform lock-in.