In the past three months, the release notes of cutting-edge AI coding tools have shown remarkable convergence. In daily development, the major AI coding harnesses are fully interchangeable. This convergence is no coincidence — it reflects a fundamental shift in the human-machine collaboration relationship. We are no longer using one-off chat assistants; we are hiring “virtual interns.” To understand this convergence, we must first see where large models sit within the development workflow.
In day-to-day development, large models behave very much like virtual interns. They do improve development efficiency. First, their retrieval speed is extremely fast. It takes humans weeks to become familiar with a system; it takes them a second. Second, they never tire. They can work around the clock, with negligible running costs. Their responses are measured in seconds, making them far cheaper than hiring human programmers.
However, this subordinate also has clear limitations. First, they lack spatial awareness. Unable to see the page, they struggle to distinguish buttons from input fields. Second, they easily lose the plot. In long tasks, they readily drift off target and spiral into endless loops. Third, they lack common sense. They cannot judge whether a change is safe, and careless actions easily corrupt files.
The core problem is that large models cannot independently perform reliable self-verification. These limitations make them incapable of delivering finished products directly. To let the virtual subordinate work safely, the developer must establish clear guardrails. These guardrails are precisely the feature puzzle that major mainstream tools are now converging to fill.
To address the limitations above, cutting-edge tools have produced highly convergent solutions on the control plane:
These highly convergent features are, at their core, management tools aimed at virtual interns.
The most glaring shortcoming of a virtual subordinate is the inability to self-verify. They often submit code directly after writing it, with no idea whether it is correct. If allowed to run blindly, the agent easily spirals into dead loops or logic drift.
To bridge this gap, agents cannot work in a vacuum. This is the core premise we discussed in Loop Engineering. To achieve self-convergence, developers need to build a complete verification foundation locally:
But even with this verification foundation in place, full unattended operation remains elusive. This is primarily because large models have inherent behavioral limitations.
During long tasks, even top-tier models like Opus 4.8 slack off.
After hitting multiple compile errors, it tends to find excuses to
terminate the task, replying with things like “it’s getting late, let’s
wrap up here for today.” Codex’s agent loop enforces a
done when routine within the harness specifically to
counteract this tendency to call it a day early.
Thus, we need overseer functionality on the control plane. This is the self-running loop feature that every major harness is competing over.
This is no magic trick, but an engineering assist meant to compensate for model limitations. The system programmatically identifies coasting behavior and forcibly pulls the agent back on track until the standard is met.
In specific engineering design, the differences between this self-running overseer mechanism, periodic polling, and scheduled tasks are shown in the table below:
| Dimension | Goal-Driven Self-Running Loop | Session-Level Periodic Polling | Persistent Scheduled Tasks |
|---|---|---|---|
| Exit Condition | State-driven, includes goal-completion check | No exit condition check, fixed repetition | No exit condition check, fixed repetition |
| Trigger Method | Driven by core logic or model state | Triggered by fixed time intervals | Triggered by fixed time schedules |
| Lifecycle | Runs until goal achieved or user terminates | Expires when the current terminal session ends | Survives reboots, with catch-up runs for missed intervals |
| Representative Implementation | Claude Code /goal command | Claude Code CLI /loop | Antigravity /schedule command |
Currently, only a small number of tools possess a genuine goal-driven self-running loop. Claude Code’s /goal feature allows the agent to autonomously carry out multiple rounds of modifications. In OpenAI’s tests, an agent ran continuously for twenty-five hours and generated thirty thousand lines of code. Cursor’s /loop skill, meanwhile, blends periodic scheduling and is currently following this direction.
In human-machine collaboration, visual and spatial intent is hard to convey with pure text. A developer can hardly describe the offset or layout of a button through chat messages alone. Therefore, both sides need a shared visual canvas to eliminate communication friction. This is not only so that the manager can visually verify the agent’s output, but also to satisfy the need for bidirectional synchronization between design and code in team collaboration.
The spread of this reconciliation canvas reflects a systemic convergence underway in the R&D field.
A clear example is the bidirectional convergence of Figma and coding tools from opposite ends: Figma, as a design tool, has introduced code layers and MCP services; while Cursor, as a coding tool, has conversely launched shared canvases.
Design tools are reaching downward; development tools are extending upward. The two sides have ultimately converged on this visual reconciliation canvas as common ground. This breaks the traditional code-generation workflow and fully dissolves the boundary between development and design.
Traditional desktop programming is a strongly synchronous mode of work demanding instant feedback. But in the async long-task model of managing a virtual subordinate, things change. When an agent executes a long-running task, it often takes tens of minutes or even hours. If the developer has to sit motionless in front of the screen staring at logs, the mental drain is immense.
Thus, human-machine collaboration must become asynchronous, and the phone becomes the key to decoupling. The mobile terminal does not write code; it handles async control of long-range tasks.
First, it serves as a real-time status monitoring tool. Because agents carry the risk of going off course and consume tokens, the manager needs to keep track of progress via mobile at any time to prevent cost from spiraling out of control.
Second, it acts as a lightweight decision gate for interaction. When the agent encounters a security confirmation or a critical decision in the background, it pushes a notification to the phone and performs a safety interception.
The manager taps on the phone to authorize or terminate. Development work is thus decoupled from the desktop and becomes async long-range delegation.
That said, these features point in two opposite directions. Self-running loops and shared canvases push the agent toward async organizational mode — the agent runs autonomously for long periods and produces output shared by the team. The iOS app and Design Mode push the agent in the opposite direction toward personal, close-at-hand mode — the user monitors from the phone at any time and annotates directly on the interface. Both directions appeared in a single release. This suggests Cursor is betting on both paths simultaneously and has not yet converged on an end state.
Human developers possess basic risk awareness, but a virtual subordinate needs the system to set up safety lines. To prevent the subordinate from tampering with critical files, the harness must construct an isolated runtime environment. This is the role of Claude Code’s defense system and the OpenHands sandbox. The system uses container isolation for the runtime, preventing the agent from reading secrets or executing dangerous commands. Currently, agents remain merely single-user, single-goal async task executors. To elevate a virtual assistant into a true collaborator, the system still needs to achieve multiple technical leaps. These include persistent context, fine-grained permission control, and token budget management.
The convergence of harness features is an inevitable product of underlying model homogenization. While the end state of each tool’s convergence path remains undecided, upgrading management habits is the more urgent matter. We should rethink how we use agents, and try shifting from real-time chat toward long-duration tasks. Developers need to upgrade from being drivers who write code to being qualified project managers. This cognitive upgrade often happens before release notes start looking alike.