AI AgentAI Products & Platforms

The Canonical Harness: A Standard That Won't Arrive

Published Apr 18, 2026

The agentic AI space today is crowded with products that all look like they’re doing the same thing. Claude Code, Codex CLI, Cursor, OpenCode, Responses API, OpenClaw, Microsoft’s Copilot family — all of them wire up a large model, add a tool-calling loop, and manage some runtime state. A natural question follows: will these converge into a single de facto standard, the way Chat Completions once did, so that developers can write once and run anywhere?

My answer is no. This piece tries to explain why. The point isn’t that it’s technically impossible. It’s that the commercial logic simply doesn’t support it.

How Chat Completions Became a Standard

Let’s trace how we got Chat Completions in the first place.

OpenAI kept the chat completion interface extremely simple. Take a messages array as input, return a single message. No state, no side effects, one call and done. That simplicity is exactly what let every other vendor copy the shape. The models and infrastructure behind them could be completely different, but the public interface ended up identical. By 2026, more than 80% of new model providers expose a /chat/completions endpoint on day one. DeepSeek, Kimi, GLM, Qwen, Mistral — not one of them broke ranks. It’s the clearest de facto standard in the entire LLM ecosystem.

Two conditions carried it there. First, all the intelligence lived inside the model. The interface just pushed strings in and pulled strings out, carrying no additional semantics. Second, vendors had no incentive to differentiate at the protocol layer. Their moat was the model itself, so the more universal the interface, the easier it became for users to adopt their model, which helped them.

Neither condition holds now. To see why, we need to get clear on what a harness actually does.

A Harness Lives at the Runtime Layer

In an earlier piece on Claude Code as a runtime, I split AI integration into four layers. Briefly: the model layer decides which LLM you use, the protocol layer decides how you call it, the runtime layer decides how state is managed, how tools are invoked, how context is organized, and the contract layer decides what counts as done.

Chat Completions belongs to the protocol layer. That layer could be simple in that era because applications implemented runtime and contract themselves; the protocol only had to carry strings.

Where does a harness belong? At the runtime layer. Claude Code’s value isn’t in how it makes an HTTP request. It’s in how it manages the conversation: when to compact history, where to place prompt-cache breakpoints, whether a tool call needs user approval, how to keep the prefix stable so cache hit rates don’t collapse. Codex CLI’s value isn’t in its protocol implementation. It’s in the sandbox design, how it assembles AGENTS.md, how state moves between sub-agents. Cursor Agent’s value isn’t in which model it connects to. It’s in how it puts IDE context and agent instructions into the same interface.

There’s a key difference between the protocol layer and the runtime layer. The protocol layer’s complexity is capped. A handful of message-passing rules, a narrow implementation space. The runtime layer’s complexity is open-ended. Context management alone breaks into compaction strategy, cache breakpoints, sub-agent boundaries, tool schema organization, state serialization — dozens of sub-problems, each with several defensible designs.

So the question becomes: can the runtime layer be flattened into a shared shape the way the protocol layer was?

Different Vendors Are Solving Different Problems

Something happened in Q1 of 2026. OpenAI, Cursor, and Anthropic published their harness engineering pieces almost simultaneously. The vocabulary overlapped heavily. The actual engineering problems had almost nothing to do with each other.

Anthropic was solving the problem of keeping one agent on track over four hours of continuous work. Cursor was solving the problem of running hundreds of agents in parallel without stepping on each other. OpenAI’s Symphony was solving the problem of steering a fleet of agents with minimal human intervention. One shared word, three completely different product pressures.

That divergence carried straight into each company’s design choices. Claude Code’s compaction strategy, its eight layers of anti-impersonation defense, the binding of thinking-block signatures to API keys — all focused on long sessions that don’t drift. Cursor’s recursive Planner-Worker architecture, worker isolation, tolerance for a small but stable error rate — all focused on parallel throughput. Codex’s App Server protocol, the Thread/Turn/Item primitives, the kernel-level sandbox — all focused on exposing a single agent core to TUIs, IDEs, the web, and Xcode without compromising safety.

Users need different things in different situations. Long-form writing and deep thinking lean on long-session stability, which is why Claude Code shines there. Bulk PR work and team collaboration lean on parallelism, which is where Cursor and Symphony fit in enterprises. Zero-friction cross-device access leans on smooth human coordination, which is why OpenClaw and Dispatch took off with consumers. When you need A today and B tomorrow, two harnesses simply aren’t interchangeable. That’s why one person ends up using several harnesses, and why they won’t merge.

Three vendors using the same word for three different things tells you the runtime layer doesn’t have a shared definition of the problem. Without a shared problem, there’s no shared answer, and the idea of a unified form never gets off the ground.

Every Runtime Design Is a Two-Sided Contract

Every design choice at the runtime layer does two jobs at once: it delivers a technical capability, and it raises a commercial wall.

The Claude Code source leak made this explicit. The leaked code shows eight layers of anti-impersonation defense: native client attestation written in Zig, deliberately injected fake tools to poison distillation, thinking-block signatures bound to API keys, detection of intermediate proxy gateways, and more. Together these give Claude Code an official identity at the API endpoint, so that third-party impersonators or middle proxies can be recognized on the server side. On 2026-03-21, Anthropic cut off the OAuth login path for third-party harnesses. Any third-party tool that wants Claude now has to pay for API access. After that, Kimi and GLM could still hook into Claude Code, but only through awkward workarounds. Switching model providers within a single session got noticeably harder for users.

The same pattern shows up in the Responses API, no source leak required — the interface design itself tells the story. Sean Goedecke pointed out that the API is stateful for a specific reason: to keep the reasoning trace on the server. Clients only receive a previous_response_id; they can’t see the full chain of thought. For OpenAI, this is a reasonable commercial choice, protecting the model’s reasoning from being distilled. But the same choice prevents third-party providers from genuinely implementing Responses API, because doing so would mean helping OpenAI hide CoT. As of April 2026, the list of third parties supporting Responses API is just Amazon Bedrock (for GPT-OSS models) plus a handful of aggregators. Compare that with the 80%+ adoption of Chat Completions among major providers — two orders of magnitude apart.

Matt Mayer ran a controlled measurement. The same Claude Opus scored 77% on SWE-bench inside Claude Code, and 93% inside Cursor. A 16-point spread between harnesses, and for once not in Claude Code’s favor. Two things follow. The harness can influence results on the scale of a generational model upgrade, and vendors have every reason to turn that influence into a product differentiator. Once a harness can move performance that much, it stops being shared infrastructure and becomes a competitive axis.

Mechanisms at the protocol layer only handle communication, so they can be shared. Mechanisms at the runtime layer carry both capability and moat, so they can’t. It’s not a matter of willingness. Sharing them runs straight into the vendor’s own commercial logic.

Even the Protocol Layer Is Fracturing

So much for the runtime layer. The protocol layer is shifting too. MCP’s trajectory is the clearest example.

MCP is a tool-invocation protocol Anthropic introduced at the end of 2024. It was positioned similarly to Chat Completions, intended as a simple protocol layer that any vendor could adopt. The adoption numbers are strong. SDK monthly downloads went from around 100k in early 2025 to about 97 million by March 2026. PulseMCP lists more than 8,600 public servers. Roughly 28% of Fortune 500 companies run MCP servers in production. Linux Foundation took over governance at the end of 2025.

Those figures mostly reflect breadth of adoption. Look at how each vendor actually implements MCP, and the divergence has been baked into the code for a while. OpenAI’s Apps SDK extended MCP in October 2025 by adding a _meta field that bypasses the model’s context window to pass GUI state directly to the frontend. That contradicts MCP’s core design principle — all information exchange should flow through context the model can see. Engineering-wise, the workaround made sense: stuffing a 10KB HTML into context pollutes reasoning, drives up cost, and drops success rates. Later, OpenAI formalized _meta into an openai/* extension path. Any MCP server deeply using Apps SDK no longer runs in Anthropic’s or other hosts.

Other layers are splintering too. The stdio transport was deprecated, the migration to streamable HTTP shipped breaking changes, and already-deployed enterprise servers had to rewrite their transport layer. OAuth 2.1 arrived with a cluster of CVEs and localhost-related attack surface, and the patches brought more breaking changes. Kiro CLI has measurable compatibility issues with some MCP servers, and Websets authentication breaks in cross-host scenarios.

Feishu and DingTalk made the point more bluntly in March 2026. China’s two largest enterprise collaboration platforms open-sourced their command-line tools on GitHub within 24 hours of each other. DingTalk covered ten core capabilities; Feishu wrapped over 2,500 APIs into a CLI. Neither shipped an MCP server alongside. The platform vendors’ choice was clear: the preferred path for exposing platform capabilities to agents today is CLI + JSON output + –help as skill documentation, not MCP.

MCP’s situation shows one thing. Even a protocol designed specifically for the agentic era struggles to hold together across vendors, because the content and semantics it carries are far more complex than the pure string passing of Chat Completions. Adoption keeps climbing while implementations keep diverging underneath.

Convergence Is Happening Above and Below the Runtime Layer

Up to this point, the argument could leave you feeling that nothing standardizes in the agentic era. That’s not the conclusion. The accurate version is: convergence is happening, just above and below the runtime layer, not at the runtime layer itself.

The clearest case is the command line as a de facto interface. Mainstream agent runtimes already run inside a shell — Claude Code, Codex, Cursor Agent, all of them. Their most reliable, lowest-friction actions are reading files, writing files, and executing commands. If a platform wraps its capabilities into a command-line tool with –json output and –help documentation, any agent can use it right away, no adapter needed. This is an extremely simple convention, barely a protocol — essentially Unix’s forty-year-old stdin/stdout/exit-code triple. Precisely because it’s too simple to differentiate on, no vendor has reason to hold it privately. It becomes the de facto interface by default. Feishu and DingTalk, Google Workspace’s experimental gws, HKU’s CLI-Anything — all examples along this path.

AGENTS.md is the other. Released by OpenAI in August 2025, now adopted by more than 60,000 open source projects and supported by sixteen major harnesses — Claude Code, Codex, Cursor, Windsurf, Aider, Gemini CLI, Copilot, VS Code, and others — all reading it from the project root. Linux Foundation’s Agentic AI Foundation took over governance in December 2025. What’s unified is the filename, location, and the agreement that “agents will read this file.” What’s not unified is how the content should be structured, when it gets injected, how sub-directory files merge with the root, and how it coexists with CLAUDE.md, .cursor/rules, .github/copilot-instructions.md, and other older formats. Strictly speaking it’s a social contract, not a technical standard. It’s still the most widely adopted protocol-layer convention in the agentic era.

These two de facto interfaces share a feature: neither is at the runtime layer. CLI + JSON sits at the bottom of the protocol layer, too simple to carry state management. AGENTS.md sits at the top of the contract layer, just a markdown file, with no execution semantics. They flank the runtime layer from above and below, landing in the exact spots where no vendor has any incentive to introduce proprietary extensions.

Look back at the attempts to standardize the runtime layer, and the endings are all the same. Responses API has exactly one third-party implementation (Bedrock). Claude Code is actively closed down by Anthropic. MCP’s adoption is still growing, but implementations have diverged across vendors. Even Linux Foundation governance hasn’t stopped each host from layering its own dialect on top.

It’s not that the runtime layer is unimportant. The opposite: the runtime layer is the most important layer in this era. That’s precisely why vendors treat it as the main competitive battleground, and none of them has any incentive to surrender it to a standard.

What Users and Builders Should Do

For users: accept that harnesses won’t converge, and put your effort into matching harness to scenario. Claude Code’s stability over long sessions, long writing, and deep thinking comes from its investment in long-session work. Cursor Agent’s smoothness inside the IDE comes from its interaction-response design. Codex CLI’s open source nature gives it unique value in scenarios requiring deep modification. OpenClaw and Dispatch bring value to consumers through their zero-friction entry points. Using several harnesses at once isn’t chaos — it’s picking among different kinds of scalability depending on the situation.

For builders: a year ago, I wrote a piece about agentic frameworks, arguing you should forget all frameworks and build agents from first principles. In 2026, that needs updating. Forgetting frameworks is no longer the question; the agent loop no longer needs to be written by hand, because Claude Code, Codex, Cursor, and others have matured it to the point where it can be reused directly. The new trap to avoid: don’t expect a universal runtime. No single runtime will become the de facto standard, and vendor vertical integration will only deepen. The reliable de facto interfaces come down to two: shell-based command lines with JSON output, and AGENTS.md plus related documentation at the project root. Everything else — how state is managed, how tools are invoked, how sub-agents coordinate, how caches are reused, how context is compacted — will be reinvented inside each vendor’s harness.

Integration at the protocol layer can stay cross-vendor. Products at the runtime layer require picking a vendor and binding deep. That line is clearer and more important than it used to be. The Chat Completions era’s convenience of not caring whose infrastructure you run on has moved elsewhere in the agentic era.