AI CodingDeveloper ToolsChina Tech Ecosystem

AI Coding Is Entering Its DevOps Moment

From June 23 to 24, 2026, Volcano Engine held its FORCE conference. In previous years, this kind of industry conference was usually dominated by multimodality, large-model call volume, and the MaaS business. This time, however, Volcano devoted a large share of the agenda to AI coding. The Economic Observer used a fitting headline in its coverage: ByteDance is filling in its AI coding gap.

At the conference, ByteDance released Doubao 2.1 Pro, with a focus on stronger coding and agent capabilities. The original TRAE SOLO was also upgraded into TRAE Work, split into two modes: Work for everyday office work, and Code for software development.

For developers and engineering managers, though, the real signal from this conference was not the product-name change. It was the internal engineering data ByteDance made public.

Hong Dingkun, ByteDance’s vice president of technology and head of the TRAE business, said that over the past year, the share of code produced by AI inside ByteDance increased sixfold, and the TRAE team exceeded 90%. TRAE’s average daily Token consumption reached 5.6 trillion, 50 times higher than a year earlier. Volcano Engine president Tan Dai gave the broader context: as of June this year, the Doubao large model’s average daily Token call volume had exceeded 180 trillion. China Entrepreneur recorded this set of numbers and noted that Volcano had taken close to half of China’s domestic MaaS market.

This creates a contrast: AI can write code more than 10 times faster than humans, and more than 90% of the TRAE team’s code comes from AI. Does that mean the team’s engineering efficiency also multiplied several times?

The answer is no. Hong Dingkun’s number was that per-capita requirement throughput increased by about 60%.

This data point matters because it gives both the input side and the output side. The input side is how much code AI wrote. The output side is how many requirements the team ultimately delivered. Once both sides are measured, the loss in the middle becomes visible: code generation volume grew extremely fast, but real delivery did not grow at the same rate.

Hong Dingkun gave several concrete examples at the conference. An engineer might only want to change one parameter, but AI copies out and rewrites a large block of related logic, leaving redundant code behind. AI can generate a piece of code quickly, but putting it back into the existing architecture, aligning interfaces, integrating it, and testing it still leaves a long distance to cover. There is also a more typical organizational problem: inside ByteDance, some product managers used AI to build feature-complete pages and hoped to ship them directly, but the engineering team stopped them because the code needed refactoring for extensibility, security, and performance.

These problems point to the same bottleneck: AI has made code generation cheaper, but the downstream process of turning code into shippable software has not become cheaper at the same pace.

Remio relayed another set of experimental data in its reporting. According to Remio, ByteDance ran more than 900 tests. Mainstream models could achieve code correctness above 80%, but without surrounding tool support, code deliverability was only around 40 to 60 points. After adding a harness, meaning validation and support tools such as testing, dependency checks, and staging, deliverability could rise to around 80 points. For now, this set of numbers mainly comes from media reporting. We have not yet seen an official white paper or full transcript, so attribution should be handled cautiously. But it points in the same direction as the fact that a 90% AI code share only translated into a 60% throughput gain.

What the FORCE conference really exposed was not whether ByteDance is building AI coding tools, nor how TRAE Work is positioned. It looked more like a pressure test after large-scale practice: in the AI era, the act of writing code is becoming extremely fast, but the bottleneck in software engineering is moving downstream.

The DevOps moment of AI coding: code generation is already fast, but a harness pipeline must pass it through review, testing, dependency checks, and staging before it becomes real delivery.

Code Generation Has Accelerated, but the Delivery Pipeline Has Not Caught Up

A 60% increase in per-capita requirement throughput would be a major gain for any mature engineering team. So this data should not be read as a failure of AI coding. What it really shows is something else: code generation speed and software delivery speed have become visibly disconnected.

You can think of AI-written code as semi-finished goods in a factory. When AI keeps producing code like an assembly line, downstream steps such as review, compilation, test execution, dependency management, deployment to staging environments, and security audits do not automatically accelerate. New code queues up in front of these checkpoints and turns into inventory. It has been produced, but it does not yet count as delivered.

Software engineering has seen a similar bottleneck shift before.

Around 2010, agile development became more widespread. The frequency with which teams wrote code and submitted changes compressed from once every few months to once every few weeks, days, or even less. But for many teams at the time, merging code into main, testing it, and deploying it to servers still depended on manual compilation, manually run tests, email-based operations scheduling, and handoffs to infrastructure teams. Development finished the change, but integration and deployment became a nightmare. The same pattern appears repeatedly in security: the real problem is often not the code snippet itself, but deployment, permissions, and environment boundaries. In earlier analysis of vibe coding leak cases, the same structure appeared: the issue was not whether the code could run, but whether the deployment layer blocked the risk.

DevOps did not emerge because everyone suddenly developed a taste for CI/CD or YAML. It was forced into existence by high-frequency change. Once deployment frequency increased, the old delivery process could no longer keep up, and the industry had to automate testing, integration, and release. The point of a CI/CD pipeline is to let high-frequency changes enter production reliably and safely.

AI coding today is in a similar position. The difference is that this time, the part that suddenly became faster is not deployment, but code generation farther upstream. Token costs have fallen, model capabilities have improved, and code generation has become a low-cost supply. But downstream review, testing, dependency management, staging, architectural constraints, and security audits have not automatically become cheaper.

This explains why a 90% AI code share only turns into a 60% throughput increase. Code generation has become faster, but teams still lack a delivery pipeline that can match that speed. The role of the harness here is similar to the role CI/CD played back then: automatically run tests, check dependencies, enforce architectural constraints, spin up staging, and send high-frequency generated code through a repeatable validation process.

In other words, harness is the CI/CD of the AI code era.

The Team’s Absorption Bandwidth for AI Code

When discussing harness engineering, we usually look at several dimensions: how long a single agent can run, how multiple agents run in parallel, and how humans interact with agents. Volcano’s data adds another dimension: the team’s absorption bandwidth for AI code.

Tokens can be bought with money, and the throughput of AI-generated code can be scaled with compute. But a team’s ability to absorb that code does not automatically double. When a product manager uses AI to finish a page, engineers still need to check how network requests are sent, whether components can be reused, whether there is any privilege-escalation risk, and whether performance will drag down the page. AI amplifies code production, but review, architecture governance, and integration validation remain scarce.

Organizational absorption bandwidth: Tokens can scale the AI generation side, but review, architectural constraints, security audits, and context engineering determine how much AI code the team can digest.

The context engineering, architectural constraints, and knowledge accumulation ByteDance mentioned are essentially ways to widen this absorption pipeline.

Start with context engineering. If AI has to reread the codebase every time it takes on a task and re-infer the team’s historical baggage, naming conventions, and architectural preferences, Tokens are spent on repeated understanding rather than moving the requirement forward. Once code structure, technical decisions, and historical conventions are organized into context that AI can read directly, the code AI writes is more likely to fit the existing project. This matches the earlier argument about team context infrastructure: after model capabilities cross a certain threshold, the real differentiator is often whether the model can access dense, reusable context.

Then look at architectural constraints. Letting AI write freely can easily increase the entropy of a codebase. Teams need to add rules to the pipeline that block code violating layering, dependency direction, or security boundaries before it enters main. The security boundary here is not an abstract concern. AI coding tools read project configuration, rule files, and code comments. In an earlier analysis of AI coding configuration-file injection, the core issue was which contents in the same repository are treated as data and which contents become instructions for the agent.

Finally, there is knowledge accumulation. A pitfall one agent hits in module A should be reusable by another agent working on module B. If that experience still has to flow through human meetings, long documents, or verbal handoffs, knowledge movement will lag behind code generation speed.

Stronger models will not automatically solve these problems. Even the smartest model does not know why your team kept a strange interface five years ago, nor what special requirements your organization has around permission control. The team itself has to deposit that information into the codebase and engineering environment.

The Next Stage of Competition Is Not on the Generation Side

If this judgment holds, the next stage of competition among AI coding tools will shift from “who writes code better” to “who can move AI-written code through the delivery pipeline reliably.”

Engineering managers also cannot evaluate AI coding tools only by AI code share. That metric looks only at the supply side. It is like measuring how fast a factory produces parts while ignoring how badly the inspection station and warehouse are clogged. Anthropic’s analysis of real Claude Code usage points to the same shift: users are moving from “ask AI to fix a bug” toward “ask AI to finish an entire piece of work,” which means the measurement unit also has to move from code snippets to delivery of complete work blocks.

What matters more are delivery-adjacent metrics: how long a requirement takes from start to merge or deployment, what percentage of AI-generated code passes tests on the first run, how much time senior engineers spend reviewing AI-submitted changes each day, whether rollback and incident rates change after launch, and how much AI-generated code remains after some time has passed.

These metrics measure final delivery efficiency, not paper numbers on the generation side.

The data from this FORCE conference gives the industry a reminder: the productivity dividend from AI writing code has already begun to materialize. The TRAE team’s 90% code contribution rate and daily consumption of trillions of Tokens both show that getting AI to write code is no longer the hardest part.

The next question is how to reduce the loss between code generation and code delivery. This cannot be solved by waiting for larger models alone. Teams need to do what they did when they built DevOps pipelines: fill in testing, review, dependency checks, staging, architectural constraints, and context infrastructure. Whoever gets this AI-code delivery pipeline running first will be able to turn AI coding into sustainable engineering efficiency.