AI Products & PlatformsTrust & Governance

AI Search Is Being Infiltrated by Content Farms

2026-04-19

A friend in our group chat, yousa (yousali.com), wanted to buy wool dryer balls and used AI search to look up information. The answer came back citing a 2022 comparative study from the University of Wisconsin’s extension department, a 2023 textile lifecycle report from MIT, and two sets of data precise to two decimal places, attributed to ASTM and AATCC testing standards.

All of it was fabricated.

The University of Wisconsin extension department does exist; that study does not. MIT never published that report. The ASTM and AATCC standard numbers were formatted correctly, but the numbers corresponded to measurement categories that had nothing to do with the cited figures. Real institutions, real formatting, fake content.

yousa read a few more search results and noticed the skeletal structure of the citations was nearly identical across articles, yet the wording in each was different enough that any single piece read like independent writing. The domains pointed to Alibaba subdomain pages and a cluster of small sites targeting the English-language market.

One person getting burned buying dryer balls is not a big deal in itself. But the phenomenon behind it has reached industrial scale: English-language content farms operated by Chinese teams, mass-produced with AI, and stuffed with fabricated academic citations are systematically colonizing the retrieval pool that AI web search draws from. For everyday consumer queries, these site networks already outrank Wirecutter and Consumer Reports.

Multiple Lines of Evidence, One Conclusion

The logic of RAG and web-connected search is to have the model answer based on its sources. For academic and current-affairs queries, the source pool contains strong mainstream media and institutional content, so this works well. Consumer and lifestyle queries are the opposite: the source pool itself is being polluted, and the more faithfully the model cites, the less reliable the output becomes.

NewsGuard has been continuously tracking AI content farms. In November 2024 they were monitoring 1,121 such sites; by March 2026 the count had grown to 3,006, adding three to four hundred per month. Over the same period, they tested the repetition rate of popular false claims across ten mainstream AI tools each month. The rate rose from 18% in August 2024 to 35% in August 2025. Model refusal rates dropped from 31% to near zero, because every product had added web search. Models no longer refuse to answer when uncertain; instead they search the web for sources and repeat whatever they find.

Ahrefs ran an experiment: they fabricated a luxury brand from scratch, published a brand story on Medium, then queried eight AI search tools. Perplexity and Gemini had error rates of 37% to 39%; ChatGPT was below 7%. ZipTie tested from a different angle and found that over 60% of source links returned by ChatGPT search pointed to incorrect content. Both experiments are about the same thing: the citation layer of AI search is thin, and the correspondence between sources and conclusions is far less reliable than the product interface suggests.

These are not isolated incidents. The supply side has industrialized. Cisco Talos and Palo Alto Networks exposed two cross-border SEO poisoning operations in 2025, codenamed DragonRank and Operation Rewrite. The primary operators were Chinese teams, and the campaigns affected over 200 countries. The technique involves compromising enterprise servers and serving different pages depending on the visitor’s identity. A more advanced variant is called AI-targeted cloaking: servers identify the crawler user agents of ChatGPT, Perplexity, and Gemini and serve them a version of content optimized specifically for AI, while regular browsers visiting the same URL see a completely different page. SPLX’s analysis published on The Hacker News documents the full technical chain.

The Chinese domestic side has its own version. The 2026 315 Gala (China’s annual consumer rights broadcast) demonstrated the Liqing GEO optimization system live on air. The operator created a fictitious smart wristband called Apollo-9, claiming it featured quantum entanglement sensing and black-hole-grade battery life. The software automatically generated over a dozen fake reviews, fake expert endorsements, and fake industry rankings, then posted them to preset self-media accounts. Within two hours, an AI model began recommending the product; within three days, two major AI platforms had placed it on their smart wristband recommendation lists. 21st Century Business Herald, citing data from CAICT (China Academy of Information and Communications Technology), reported that the domestic GEO services market reached 4.2 billion yuan in 2024, growing at 38% annually, and is projected to exceed 18 billion yuan by 2026. The Paper traced Liqing’s operator to Beijing Lisi Culture Media, incorporated in 2018 with 1 million yuan in registered capital and one employee on its social insurance rolls as of 2025. The person in charge said on camera: spending several million yuan to poison a competitor is a reasonable investment.

Fabricated academic citations on the English side, GEO-manufactured brand narratives on the Chinese side. Same supply chain, two outlets.

The fabricated-citation thread has further documentation. After Mata v. Avianca, cases of lawyers being sanctioned for submitting ChatGPT-fabricated case law have become a recurring pattern. GPT-generated papers have entered Google Scholar’s index via automated crawlers. GhostCite tested 370,000 AI-generated citations across 40 fields and found hallucination rates ranging from 14% to 95%. The common pattern: the more a domain relies on authority signals to judge credibility, the more sophisticated the fabrication becomes.

What AI Changed

Content farms have always existed. What AI changed is the economics of three specific mechanisms.

The first is the collapse of deduplication. Legacy content farms copied from each other: paragraphs overlapped, images were reused, and search engines could easily identify same-source duplication. AI rewriting scrambles the surface layer entirely. Each article reads as if written independently by a different person, but the underlying factual claims are identical. When search engines encounter a batch of pages that look different but reach the same conclusion, they tend to treat them as multi-source consensus rather than single-source proliferation. This inverts the deduplication logic.

The second is that trust signals can now be manufactured at scale. ASTM followed by D plus four digits, a university name followed by “Extension,” a year falling within the last three to five years. These format combinations are the intuitive cues English-language readers use to judge whether content is serious. Inserting them as templates into mass-produced articles exploits the reader’s default trust in institutions. Teams more familiar with the English academic system face lower fabrication costs, because they know which format combinations pass the reader’s intuitive check. This also explains why much of the sophisticated fabrication originates from Chinese teams fluent in both Chinese and English academic ecosystems: their arbitrage space between the two systems is the largest.

But readers who are familiar with English academic writing are, paradoxically, more vulnerable. Their credibility judgments run on pattern matching: standard numbers, university extensions, recent years trigger an automatic high-confidence score. Clicking through to the source text reveals nothing wrong either, because AI rewriting has eliminated the stiffness of non-native prose, and paragraph structure and terminology conform to academic norms. This verification habit rests on an outdated premise: that untrustworthy content will betray itself through formatting or language flaws. AI rewriting plus templated fabrication invalidates that premise. Ironically, people who only rely on friend recommendations and video demonstrations and never read review articles are safer for this category of queries. Before fabrication quality catches up to genuine quality, knowledge helps with discernment; after it catches up, the same knowledge drives greater confidence in the wrong direction.

The third is that the attack surface has expanded to crawler-exclusive channels. AI-targeted cloaking means the server returns different pages to AI crawlers and human browsers. A user who clicks through to verify sees normal content; the AI citing the same URL received a different version. The last line of defense, manual verification, is thereby neutralized.

A 2024 study from Harvard Kennedy School found that LLMs cite a pro-Kremlin propaganda aggregation site on niche topics, because such queries lack mainstream sources and whoever fills the void first becomes the default answer. Many consumer product queries occupy the same vacuum: no Wikipedia entry, no in-depth reviews from major outlets, just site networks. The model dutifully retrieves the content that best fits format requirements, and polished formatting is precisely the core output of content farms.

Users Bear the Final Cost

Model providers can improve retrieval algorithms, downrank low-trust domains, and add source diversity scoring. These measures all help. The problem is that the marginal cost of fabrication is also falling, using the same generation of AI tools. Both sides of the attack-defense dynamic share the same technology stack, and the defense side cannot catch up to the supply side’s iteration speed within any foreseeable timeframe. Until the two reach some equilibrium, users absorb the cost of wrong answers: buying useless products, taking the wrong supplements, using products for their children that do not meet safety standards.

Countering this kind of pollution requires an entirely different approach from countering traditional hallucination. Traditional hallucination is the model generating a claim from nothing; a single fact-check suffices. This pollution comes with fabricated provenance. The verification workload is far greater: you start from the citations the AI provides, trace each one back to its primary source, confirm that the source actually exists, and confirm it actually says what the model claims. Fact-checking can be crowdsourced and pre-built into databases for known claims; provenance verification faces independently fabricated citations each time, with no existing database to query against. The time a typical user spends on provenance verification for a routine query may already exceed the time AI search saved. If provenance verification is needed every time, the net efficiency of AI web search in these scenarios is negative.

User behavior is already shifting. People append “reddit” to search queries, specify independent reviewers, read a specific author’s long-running blog, watch videos instead of reading review articles. These practices share a common underlying logic: abandoning attempts to judge truth from text content itself and relying instead on identity signals that are hard to fabricate at scale. Accounts with history, real people with reputations, personal brands with accumulated bodies of work. Users are willing to pay the extra search cost.

Multi-source consensus has long been the core basis for judging whether information is true, premised on the assumption that manufacturing an independent source is expensive enough. AI has driven that cost to near zero, which means multi-source agreement is no longer a reliable signal. At present, neither human judgment habits, AI product design, nor regulatory frameworks have found a replacement metric. The path for determining facts is shifting from statistical judgment (how many sources agree) to causal judgment (can you trace back to the original scene), and the corresponding tools and habits need to change accordingly.

Practical Recommendations

Consumer and lifestyle queries are the current worst-hit zone, where three conditions converge: large commercial arbitrage opportunities, few mainstream editorial sources, and low user verification intent. For these queries, the synthesized opinion AI search produces should be treated as a lead, not a conclusion. At the product level, the auto-generated synthesis feature should be downranked or disabled for this category and replaced with a per-source listing of raw results, letting users see which domain backs each claim.

Any AI answer containing academic citations or standard numbers requires independent verification of the citations themselves. The engineering implementation is straightforward: post-process model output, use regex to extract patterns like “X et al. (year),” “ASTM D” followed by four digits, “University of [X] Extension,” then batch-check against DOI, Crossref, institutional websites, and standards body directories. This feature has not yet become standard in mainstream AI products.

When using AI agents for research workflows, be especially wary of consensus signals. If multiple subagents return the same number or the same study name, the default assumption should be that they are drawing from the same upstream fabrication source. Manually locate at least one primary link before accepting the claim. Three agents all mentioning a “University of Wisconsin-Madison Extension 2022 study” most likely means three agents retrieved from the same cluster of content farms.

Domain trust levels need recalibration. Content on Alibaba-affiliated e-commerce domains is mostly seller-generated or agency-produced platform UGC. Its trust level should be equivalent to or lower than that of an anonymous blog. Medium and Substack are the same: the platform does not vouch for the content, and citations from these sources should be downranked when used by models.

For medical, children’s product, and safety-related queries, under the current state of the corpus, AI web search is not suitable for delivering synthesized conclusions. Google took down some medical AI Overviews in 2025 after outputs contradicted mainstream medical guidance (for example, telling pancreatic cancer patients to avoid high-fat foods). This is a downstream symptom of corpus pollution; upgrading the model does not fix it.

Over the past two years, AI web search has transformed models from closed-book to open-book exams. But open-book only works if the reference material is trustworthy. When the reference pool is flooded with fabricated academic citations, content specifically optimized for AI crawlers, and material presented in multi-source-consensus form, the open book itself becomes a risk vector: the model earnestly parrots a body of fabricated material carefully placed in front of it.

Two personal habits that will remain effective as long as synthesis costs approach zero: do not accept AI-generated synthesized conclusions for consumer and lifestyle questions; for any answer containing academic or standards citations, assume fabrication by default and verify each citation before using it.

Appendix: Key Sources

AI Search Pollution and Hallucination: - NewsGuard AI False Claims Monitor(2024–2025 月度) - NewsGuard AI 内容农场追踪 - Futurism, Google AI Overviews 准确率(引 Oumi/NYT) - Ahrefs 假品牌实验(转载 LinkedIn) - ZipTie, ChatGPT 源链接错误率 - The Verge, Google 下线医学 AI Overviews

Data Voids and RAG Echo Chambers: - Golebiewski & boyd, Data Voids (Data & Society) - HKS Misinformation Review, LLM 引用 Pravda 域名 - IJCAI 2025, Data Void Exploits in RAG

SEO Poisoning and Cross-Border Operations by Chinese Teams: - Cisco Talos, DragonRank - Palo Alto Unit 42, Operation Rewrite - SPLX / The Hacker News, AI-targeted cloaking - ZeroFox, SEO poisoning LLMs

315 Gala and the Domestic GEO Industry: - 21 经济网, 2026 315 晚会 GEO 投毒详细报道 - 澎湃新闻, 后续调查 - 36 氪, 完整曝光清单 - Asia News Network(引 CMG 央视), Apollo-9 GEO 伪造调查 - NBC News, Shanghai Haixun 虚假新闻网络

Fabricated Academic Citations: - HKS Misinformation Review, Google Scholar 上的 GPT 伪造论文 - Mata v. Avianca, Inc. 维基 - Rolling Stone(引 Georgia State 研究), 学术数据库中的 AI 虚假引用 - AAAI Proceedings, 引用幻觉检测 - AATCC 官方标准目录

Consumer Reviews and Trust Migration: - Ahrefs, Wirecutter SEO 案例 - Hacker News, Reddit 后缀搜索讨论 - Pew Research(经 The Verge 引用), AI Overview 点击率