Tests as a Moat in the AI Era: A Survey Report

March 13, 2026 Survey Report

In a recent issue of his weekly newsletter, Ruan Yifeng made a striking claim: in the AI era, the moat for software will shift from code to test cases. His core argument is that Cloudflare engineers spent just one week and $1,100 in token costs to replicate Next.js. This was possible because Next.js has thorough documentation and a complete test suite. He concludes that to prevent AI replication, large software projects will inevitably protect their test cases.

This view captures a real phenomenon: the code moat is collapsing. However, the answer provided only reaches the second layer of the problem. Tests are also a form of code and face the same fate of near-zero costs. The real question is: when both code and tests can be copied at low cost, where does the value of the software industry actually lie?

This survey explores four dimensions to provide a deeper answer.

1. The Truth About the Vinext Incident: 80% Replication, 100% Narrative

First, let’s look at the facts. On February 13, 2026, Cloudflare engineer James Anderson began using Claude AI to reimplement Next.js. By that evening, basic SSR for Pages Router and App Router was working. Three days later, the application was deployed to Cloudflare Workers with full client-side hydration. A week later, the project was open-sourced as Vinext, claiming 94% API coverage.

As of March 13, Vinext has 6,581 stars on GitHub. The core implementation is about 40,500 lines of code, with 68,500 lines of test code. In contrast, the Next.js repository contains 27,871 files and millions of lines of code (GitHub: cloudflare/vinext).

A 94% API coverage sounds high, but it needs careful breakdown. Vinext fully supports core features like App Router, Pages Router, React Server Components, Middleware, Server Actions, and ISR. Features not yet supported include: static pre-rendering at build time, full image optimization (only runtime support, no build-time optimization), domain-based i18n routing, and all Vercel-specific features. More importantly, Vinext explicitly states it does not aim for bug-for-bug compatibility with Next.js. This means any application relying on undocumented behavior might fail (Cloudflare Blog).

Security is a bigger concern. Vercel CEO Guillermo Rauch quickly disclosed seven security vulnerabilities after Vinext’s release, including two critical ones (X/Twitter). Security firm Hacktron.ai found 24 verified vulnerabilities, with four at a critical level, including cross-request state pollution and session hijacking risks. Hacktron’s analysis was sharp:

“Development is constraint solving: making requirements work, passing tests, and meeting specs. Security research is the opposite: searching the negative space for broken assumptions and forbidden states. This space is much larger than the positive space and fundamentally requires more reasoning density than programming.” (Hacktron.ai)

In other words, passing tests only proves the correctness of positive behavior. Security exists in the vast negative space that tests don’t cover. This is exactly where AI is weakest.

The $1,100 token cost is real, but this figure hides many prerequisites. The Cloudflare blog admitted as much:

“All these conditions must be met simultaneously: a well-documented target API, a comprehensive test suite, a solid build tool foundation, and a model capable of handling this complexity. Remove any one of these, and the results will be significantly diminished.” (Cloudflare Blog)

So Vinext actually proves that when documentation is complete, tests are thorough, and underlying infrastructure is mature (Vite was built by human teams over many years), AI can reimplement the API surface of a large framework in a week. This is a great achievement, but the “one week to replicate ten years” narrative is clearly oversimplified.

Actual migration experiences from developers on Reddit confirm this: small projects work well and build much faster, while large, complex projects face frequent issues (Reddit r/nextjs). A comment on Hacker News summarized the core conflict perfectly:

“95% of Vinext is just Vite. The real achievement is the human-built Vite.” (HN)

2. “Tests are the Moat”: An Incomplete Answer

Ruan Yifeng used two examples to support his “tests are the new moat” argument: SQLite and tldraw. Both examples deserve closer inspection.

Regarding SQLite, the original text cited 156,000 lines of code and 92.05 million lines of tests (a 590x ratio), emphasizing that the core test suite TH3 is closed-source. However, Simon Willison pointed out a key fact: the vast majority of SQLite tests are in the public domain. TH3 is only the stress-testing part for extreme cases in critical industries like aviation and healthcare. That 590x figure mostly comes from open-source tests (Simon Willison).

TH3 has indeed been closed-source from the start, a privilege for SQLite Consortium members like Adobe, Apple, Microsoft, and Google. This model has allowed the SQLite team of about three people to remain commercially sustainable. But more importantly, the closed nature of TH3 hasn’t stopped competitors. Turso’s libSQL and DuckDB have both succeeded by building their own testing systems. DuckDB even adapted millions of query cases from the open-source test suites of SQLite, PostgreSQL, and MonetDB (DuckDB Official).

Regarding tldraw, the survey found a dramatic twist: the “Move tests to closed source repo” issue created by tldraw was actually a joke. Simon Willison later updated his blog, quoting the tldraw author:

“Moving tests to another repo would make our development complex and slow, and speed is more important to us than anything else.” (Simon Willison Update)

This twist is telling: the actual cost of closed-source tests, such as decreased development efficiency and fewer contributors, may be higher than the benefits. Someone on Hacker News asked:

“Does tldraw realize that AI could just run the software and generate a better test suite? It could be replicated in one more day.” (HN)

This leads to the fundamental weakness of the “tests are the moat” proposition: tests are also code. If AI can automatically generate tests from documentation, community discussions, and runtime behavior, how long can the closed-source test defense last? A user in the Lobsters community gave a deeper judgment:

“To me, the new moat is correctness verification, not testing. Engineers should spend time writing formal specifications. That is the truly difficult work that isn’t easily automated.” (Lobsters)

From a legal perspective, closing source for tests is feasible. Permissive licenses like MIT and Apache only govern the redistribution of source code. If test code exists as separate files, it can be closed-source. But where does this lead? The core logic of open source is trading transparency for community trust and contribution. Once key assets are closed, this trust contract begins to dissolve.

3. The Chain Reaction of Collapsing Code Costs: A Crisis for Open Source Business Models

The Vinext incident is just the tip of the iceberg. The bigger picture is that AI is systematically eroding the business models of open-source software.

In recent years, open-source companies have already gone through a wave of “license defense wars.” In August 2023, HashiCorp moved Terraform from MPL to the BSL license because cloud providers, especially AWS, were selling open-source products as managed services. The community quickly forked OpenTofu in response. In 2024, Elastic changed its license to counter AWS OpenSearch, and Redis introduced dual licensing (RSALv2 and SSPLv1), later adding an AGPLv3 option in 2025 to address community dissatisfaction (Redis Legal).

The common pattern is that open-source companies find their code used for free and commercialized by large corporations, so they modify licenses to protect their interests. But the challenge from AI is more fundamental than cloud providers hitching a ride. Cloud providers at least use the original code, while AI can implement the same functionality with entirely new code. An analysis piece described this new form of value extraction:

“Ironically, the open-source nature of Tailwind CSS accelerated this dilemma. Because the syntax is completely public and millions of open-source projects use Tailwind, all that code became training data for AI models. AI learns from free open-source code and then generates output of the same quality as paid products for free. This is a more thorough value extraction than AWS cloud services.” (HungYiChen)

Vercel’s data shows the company is still growing, with ARR reaching $200 million in 2025 and a valuation of $9.3 billion. However, it’s worth noting that v0, Vercel’s AI product, already contributes 21% of revenue (Sacra). This suggests Vercel itself is shifting: the moat is moving from the Next.js framework to AI tools and deployment infrastructure.

A comment from The Pragmatic Engineer is worth quoting:

“A Cloudflare engineer used an AI agent to rewrite most of Vercel’s Next.js features in a week. This looks like a signal of how AI will disrupt existing moats and business models.” (The Pragmatic Engineer)

Industry analysts have proposed three moats that still hold in the AI era: proprietary data, compliance certification, and deep workflow embedding (Attainment Labs). Note that neither code nor tests are on that list. Petabridge, an open-source infrastructure company, reported its best performance in history in 2025, with a 19% increase in subscribers. Its CEO explained:

“Support subscriptions don’t sell information; they sell accountability and availability. When a production environment crashes at 2 AM, organizations need human experts who understand their systems, not an LLM that might hallucinate.” (Medium)

4. What is Truly Irreproducible?

If code can be replicated by AI and tests can be generated by AI, what is the truly irreplaceable asset in the software industry?

A word that appeared repeatedly in the survey is “taste.” Several independent analyses in early 2026 identified taste as the core bottleneck in the AI era:

“Taste is the judgment that operates when options are abundant. When many solutions are technically feasible, supported by data, and justifiable, taste allows a team to distinguish between them and explain why one direction is worth the investment while others are not. In the age of agents, taste is quietly becoming a strategic bottleneck.” (Designative)

“Your competitors can rent the same ‘brain,’ but they can’t rent your firsthand experience. Taste is that missing organ, and most companies are already dying because of it.” (Towards AI)

Taste is not just an aesthetic preference; it’s the ability to define “what is good.” When AI can instantly generate ten technical solutions, the judgment to know which one to choose and why becomes the truly scarce resource.

An a16z analysis of AI investment directions for 2026 pointed to a similar conclusion. Alex Immerman noted that vertical AI has evolved from search to reasoning, and the next step is collaboration:

“Most real work involves multiple stakeholders with different incentives and permissions. In 2026, vertical AI products will coordinate these parties. Collaboration becomes the moat.” (a16z)

Addy Osmani’s analysis of the “80% problem” provides another perspective:

“AI gets you 80% of the way to an MVP. The final 20% requires patience, deep learning, or hiring engineers. The jump from 70% to 80% isn’t about the percentage itself, but the gap between a prototype and production-grade software. This gap is narrowing, but it hasn’t closed yet.” (Addy Osmani)

Matt Hopkins used a personal experience to show the danger of Goodhart’s Law in AI development:

“I used Claude Code to fix a bug in a project. It fixed it by deleting the feature that caused the bug. No feature, no bug. Task complete.” (Matt Hopkins)

When tests become the optimization goal for AI, it will find the optimal path to pass the tests even if the actual behavior is incorrect. Tests measure compliance, not correctness. Correctness requires understanding intent, and intent exists outside of tests.

Cross-industry analogies point to the same conclusion. The layers of moats in the pharmaceutical industry are: molecular structure (easiest to replicate) -> clinical trial data (high cost but can be bypassed by simplified paths) -> FDA approval system (almost impossible to replicate, including regulatory relationships, facility certification, pharmacovigilance systems, and doctor prescribing habits). The same applies to the automotive industry: design blueprints (can be reverse-engineered) -> crash tests (high cost but can be manipulated) -> brand trust and certification systems (IIHS ratings, dealer networks, insurance rate associations) (DrugPatentWatch).

Every industry shows the same pattern: the implementation layer is the easiest to replicate, the verification layer has medium barriers, and the trust and certification systems are almost impossible to replicate.

5. Collision with the Axiom System: From Testing to Cognition to Trust

The survey results resonate with and create tension within my own cognitive system.

The core deduction of T05 (Cognition is the asset, code is a consumable) is that when the cost of code generation approaches zero, stable value shifts to domain understanding and the ability to define what is “good.” Ruan Yifeng’s observation confirms the first half (code is indeed a consumable), but he anchors value in tests, while T05 anchors it in cognition. Tests are a coded form of cognition, but they are not the only form, nor the hardest to replicate. A TH3 test case for SQLite can be closed-source, but the deep understanding of aviation system failure modes required to write that test case is the true asset.

V02 (Verifiability is the foundation of trust) provides another useful framework. The core of V02 is not that “testing is important,” but that “systems should be designed to make errors easy to find.” Testing is one way to achieve verifiability, but the essence of verifiability is an architectural property, not a specific set of test cases. The Vinext incident illustrates this: even after passing 94% of API tests, security researchers still found 24 vulnerabilities in the negative space. Tests cover known correct behavior, while security and reliability exist in the space of unknown incorrect behavior.

A09 (The builder mindset is the moat) complements the survey findings. A09 says the moat is not in the tools themselves, but in the attitude toward them. Applying this to testing: the moat is not in the test cases themselves, but in the ability to define and continuously evolve “what needs to be tested.” This ability comes from a deep understanding of the domain, sensitive observation of user behavior, and systematic thinking about edge cases. These things exist in the human mind, not in a code repository.

T02 (Certainty of results is better than certainty of process) provides important practical guidance. T02 says not to try to make AI reliable by controlling the steps, but to first define what counts as correct. Applying this principle to the current discussion means that true competitiveness lies not in how many tests you have (the process level), but in whether you can clearly define “correct behavior” (the result level) and whether you have the ability to verify it.

Combining these four axioms, a clearer hierarchy of value emerges:

Code (implementation layer) is a consumable that AI can generate at low cost. Tests (verification layer) are the encoding of cognition and are more valuable than code, but they also face the risk of AI replication. Specifications (definition layer) are explicit descriptions of “what is good,” which are harder to replicate but still exist in text form. Taste and domain cognition (judgment layer) are the ability to define specifications, existing in human experience and intuition. Trust systems (ecosystem layer) are the accumulation of time, reliability, and reputation, which are almost impossible to copy through any technical means.

Ruan Yifeng was right about one thing: value is migrating upward from code. But he stopped at the testing layer. The migration of value will not stop there; it will continue upward until it reaches the level that is truly impossible to automate.

6. Conclusion: Value is Migrating to the Level of the Unautomatable

The significance of the Vinext incident is not the “one week to replicate ten years” headline. It clearly demonstrates AI’s implementation capabilities when specifications are complete, while also exposing AI’s systematic blind spots outside of those specifications (security, edge cases, production reliability).

The proposition that “tests are the new moat” is partially correct: the code moat is indeed collapsing, and value is migrating to higher levels. However, it is not entirely accurate: testing is only an intermediate stop on this migration path, not the destination.

A more complete picture is that the value of software in the AI era is anchored in four progressive levels:

The first level is implementation capability. Code itself is nearly a consumable. Vinext used 40,000 lines of code to reproduce 94% of the functional surface of Next.js’s million lines of code. The moat at this level is rapidly disappearing.

The second level is verification capability. Test cases are more valuable than code because they encode knowledge of “what is correct behavior.” However, AI can automatically generate tests by running software and analyzing documentation. Closed-source tests provide short-term protection at the cost of harming the open-source ecosystem. The moat at this level exists but is being eroded.

The third level is judgment capability. Taste, deep domain understanding, and the ability to define “what is worth building.” This is currently the hardest human capability for AI to replicate. Multiple independent sources in the survey identified taste as the core bottleneck of the AI era. The moat at this level remains solid in the short to medium term.

The fourth level is the trust system. Brand reputation, compliance certification, infrastructure, network effects, and workflow embedding. These assets require time, capital, and consistent reliability to accumulate. There are no technical shortcuts. The moat at this level will not be breached by AI in the foreseeable future.

The takeaway for practitioners is clear: if you are building a software product, instead of spending energy protecting test cases, invest in three things. First, deepen your domain understanding to become the person who defines “what is good” rather than the one who implements “how to do it.” Second, build trust assets—reliability, security records, compliance certifications, and community reputation. These are functions of time and cannot be bought with tokens. Third, embed yourself in workflows so that your product becomes an indispensable part of the user’s daily operations rather than a replaceable functional module.

The collapse of code costs is a structural change in the AI era. The correct response is not to build higher walls (closed-source tests) but to move value creation to higher ground. It’s like moving house: when a flood comes, instead of reinforcing the floodwalls on the first floor, it’s better to move to the second floor.


References