Description: An honest horizontal comparison of major AI coding harnesses (Cursor, Claude Code, and OpenCode) against Google Antigravity, detailing their origins and the truth behind Antigravity’s stagnation.
On the Chinese internet, technical anxiety surrounding various AI coding tools (Coding Assistants) comes in waves, like tides.
At one moment, social media platforms and public accounts are flooded with hype about Claude Code, as if failing to switch your terminal will leave your development efficiency lagging behind the times. At another, Codex is aggressively pushed within independent developer circles. And after a while, collective anxiety arises over Cursor’s pricing or quota tightening, leading to complaints that the tools are no longer effective.
In reality, there is no need for such anxiety about chasing trends.
For over 95% of routine, ordinary CRUD (create, read, update, delete) development tasks, major AI coding harnesses have highly converged in terms of underlying intelligence and assistant features. In today’s routine projects, they are completely interchangeable.
Google’s Antigravity, however, is the sole exception. It has fallen seriously behind in two dimensions: model API stability and software engineering maturity.
When we calmly conduct a horizontal comparison, we find that the development of mainstream AI coding tools is in a phase of full saturation.
In a very small number of elite engineering tasks involving extremely long contexts or highly complex architectural refactoring, models do exhibit slight differences in reasoning depth. However, for the vast majority of routine coding tasks, model capabilities are severely over-engineered.
In fact, during practical development, developers often deliberately opt against top-tier models (such as Claude Opus 4.8 or GPT 5.5) in favor of cheaper, faster alternatives due to speed and cost considerations.
In terms of real-world experience, running routine code modifications on GLM-5.2, which has performed well recently, offers very fast response times, and its intelligence level is well above the utility threshold. Although one can occasionally perceive the gap between these models and top-tier ones in extremely long tasks, their performance is flawless in 95% of scenarios. The homogeneity of the underlying large models has eliminated any native generational gap at the intelligence level.
If we lay out the feature matrix of major AI coding assistants, we see a highly overlapping landscape.
The figure below shows a comparison of the feature matrix of major mainstream coding harnesses globally as of June 2026:
As shown in the matrix, capabilities like sub-agent dispatching, cloud-hosted workspaces, and background scheduled/delayed tasks are fully supported across all players.
On the client side, Cursor has an official mobile app, while OpenCode similarly possesses a mobile client that supports iPadOS/visionOS with embedded SSH tunneling. The latter is not official commercial software but a fully open-source project developed by Yage (iOS version: opencode_ios_client; Android version: opencode_android_client, which can be deployed directly by cloning the Git repository). This demonstrates that in the software feature puzzle, the open-source ecosystem and third-party plugins have rapidly filled the barriers built by official commercial tools.
For delayed task execution (e.g., setting a timer like a reminder to review in the background), Claude Code offers interactive support via local Desktop Routines, while OpenCode features a local, general-purpose Process Launcher process manager that supports SQLite-level persistence of delayed tasks and automatic compensation for missed runs. There is no longer any fundamental generational gap among major tools in terms of their feature landscape.
Claude Code’s exclusive Agent Teams (collaborating via 13 peer-to-peer operation APIs) and Dynamic Workflows (dynamically generating JS orchestration scripts to handle large-scale sub-agents) attracted significant attention upon release. However, in practical engineering environments, these mechanisms remain in their infancy, and their actual user experience is severely degraded by several issues:
These engineering pain points directly offset the advantages of the high-end features mentioned above, dragging Claude Code’s daily performance back to the same level as Cursor, Codex, and OpenCode.
This leads to an obvious conclusion: in routine development, major AI coding harnesses are highly interchangeable. If developers cannot perceive the differences between them at a given moment, choosing any of them will not affect work efficiency. Once a development task becomes complex enough to explicitly expose the subtle gaps in performance of the underlying models, developers will naturally develop a clear preference and will no longer need to search for comparison guides.
Since mainstream tools are largely similar, why do we single out Google’s Antigravity as the exception?
Because it has fallen seriously behind in two dimensions: intelligence API stability and software maturity.
The underlying Gemini 3.5 Flash model responds quickly. Although logic flaws occasionally occur, it is generally in the same tier as GLM. What truly disrupts the development workflow, however, is the intrinsic computing mechanism of the Gemini API itself.
During development, if we use a non-official coding harness (such as accessing the Gemini API via LiteLLM or OpenCode) to execute long output tasks, we frequently encounter situations where the model’s output abruptly terminates mid-way.
Through technical debugging, we uncovered the underlying API logic flaw:
Gemini 3.0+ introduced an internal Thinking Budget
mechanism. When processing complex coding logic under adaptive thinking
mode, the model’s internal reasoning trace also consumes and
deducts from the maximum output characters allowed by the API
(max_output_tokens).
This creates a mechanism conflict: even if the maximum output length
is configured very generously, when the Gemini API encounters a complex
coding task, its internal thinking mechanism generates an extremely long
reasoning trace, exhausting the max_output_tokens before
delivering the actual code. This ultimately causes the main body of the
code generation to be truncated mid-stream due to triggering the length
limit. Furthermore, because this adaptive thinking typically lacks
timely streaming feedback, the network connection is easily flagged as a
timeout by the client and forcibly disconnected.
Although the Google team has made targeted improvements in the official native Antigravity client to restore some stability, the inherent mechanism of the underlying API still poses a constant risk of code truncation during development.
Beyond the API design, Antigravity’s software maturity also exhibits obvious engineering limitations.
The figure below shows Google Antigravity’s fragmented product architecture:
Google has split it into five independent components (Antigravity 2.0 Desktop, Antigravity IDE, Antigravity CLI, SDK, and Cloud API) and abruptly deprecated the legacy Gemini CLI on June 18, forcing users to reconfigure and migrate. This redundant, horse-racing product design typical of large tech companies creates unnecessary cognitive confusion.
Stripped of over-marketed gimmicks, the most realistic conclusion is that, except for the lagging Antigravity, the other four harnesses—Cursor, Codex, Claude Code, and OpenCode—do not have a generational gap in user experience for daily development tasks. Under the trend of homogeneous convergence, they have long entered an era of seamless and imperceptible interchangeability.
Therefore, choosing a tool today is not about determining superiority at the intelligence level, but rather depends on individual emphasis on the following workflow characteristics:
Amid the wave of rapid convergence in large model capabilities, Google’s Antigravity has fallen completely behind. At the intelligence API level, it leaves long-code generation vulnerable to sudden truncation due to internal Thinking Token conflicts. In software engineering, it is trapped in deadlocks, freezing, and a fragmented five-component naming scheme. Once the “Made by Google” halo is removed, it has lost the eligibility to be compared alongside mainstream tools in this tide of seamless interchangeability.