AI 产品与平台产业与竞争

Three Locks: Why Google and Microsoft Can't Build Agentic Document Editing

It’s 2026, and Copilot still can’t edit your PowerPoint slides.

That sounds like a joke, but it’s a fact. Before February 2026, Copilot in PowerPoint could not modify existing slides at all. When users asked it to edit, the response was: I’m sorry, but I can’t do that. After February, Microsoft announced Edit with Copilot, but it was web-only, with Windows and Mac still waiting. Third-party testing concluded that even the new capabilities remain surface-level: text rewrites and color changes, but no narrative restructuring, no cross-slide logic adjustments, no rewording text inside shapes. In Prezent.ai’s words: It makes PowerPoint faster. It does not fundamentally change how presentations are built.

Google is no better. Gemini in Google Docs is a chat window: you ask it a question, it generates a paragraph, and you paste it yourself. It cannot restructure documents, reorganize content across paragraphs, or complete any multi-step operation without your intervention. MindStudio’s analysis puts it bluntly: Gemini in Workspace is confined to in-document tasks. Zapier’s 2026 annual review of AI presentation tools removed both Copilot and Gemini from their recommended list entirely.

What makes this absurd is that the technology isn’t hard. Claude Cowork shipped .docx and .pptx editing in roughly two weeks in early 2026. But that “two weeks” needs context: Cowork is not a document editor built from scratch. It is a natural extension of Claude Code, an agentic coding tool that already had a complete agent foundation — planning, execution, tool calling, error recovery. Cowork simply pointed these existing capabilities at new file types, running Python code in an isolated VM to manipulate OOXML structures via off-the-shelf libraries like python-pptx and python-docx. Once you have the agent foundation, connecting a new file type is fast.

And the current implementation is still rough. Cowork cannot even “see” what its generated slides look like — Claude’s Read tool doesn’t support image files, which means it edits PowerPoint essentially blind: manipulating XML structure without knowing whether the rendered result is correct. This problem is solvable in engineering (render to PNG via LibreOffice for AI self-inspection), but Cowork hasn’t taken that step yet.

Startups prove the same point from a different angle. Harvey AI built single-query editing for 100+ page legal contracts. Gamma built an AI presentation tool with 70 million users and a $2.1 billion valuation. These are not demos; they are products running in production.

So the question is: Google and Microsoft have the strongest AI models and the largest productivity ecosystems. Why can’t their own AI edit a slide in their own apps?

The intuitive answer is “big companies are slow.” But this doesn’t survive scrutiny. These companies are full of smart people. What they lack is not engineering capability, or even understanding of what an agent architecture should look like — when Microsoft built Copilot Cowork, they used Anthropic’s Claude model and agentic framework, proving they know the blueprint. What they lack is the agent foundation itself. And the decision not to build that foundation is what needs explaining.

Some point to technical reasons: OOXML is too complex, Google Docs’ collaboration model doesn’t fit AI’s speed. These are real facts, but they are symptoms, not root causes. Harvey AI built an OOXML translation layer in a few months. Microsoft owns every byte of OOXML documentation and engineering capability; format complexity would not be a blocker if they genuinely wanted to solve it. Similarly, Copilot’s RAG architecture — retrieve context, generate text suggestions, no direct document write access, no verification loop — looks like a technical limitation, but it was chosen, not inevitable.

So why not? I believe there are three reasons. None is fatal on its own, but together they form a remarkably stable equilibrium that keeps both companies trapped in the “chat sidebar” pattern.

Lock 1: Revenue Model Conflict

Microsoft’s entire productivity business runs on per-seat subscriptions: $30 per user per month for Copilot. This model creates a fundamental contradiction: the more agentic the AI, the fewer seats customers need.

If an AI agent can autonomously do an employee’s work, enterprises don’t need 500 seats — they need 100, or 50, or fewer. This isn’t hypothetical. In early 2026, a widely circulated investment memo described the SaaSpocalypse: the prospect of AI agents compressing seat counts had already erased roughly $2 trillion from SaaS market capitalization.

This puts Microsoft in an impossible position. Build a truly agentic AI, seat counts drop, revenue declines. Build a merely assistive AI, each seat gets slightly more productive, seat counts hold, revenue stays stable. Microsoft chose the latter. The result is a $30/month chat sidebar that makes everyone slightly more efficient but replaces no one.

The product tells the story. Aragon Research’s analysis said it directly: Copilot was poorly designed, primarily to enhance and protect Microsoft’s cash cows. The pricing strategy confirms the gap: Copilot was discounted from $30 to $18, fewer than 20% of salespeople met their AI product quotas, and Microsoft took the unusual step of cutting AI sales targets in half. When the market leader starts slashing prices and running promotions, the distance between product value and price point is clear.

Bloomberg reported a more direct case: pharmaceutical giant Amgen bought Copilot for 20,000 employees. Most gravitated toward ChatGPT instead, with Copilot used mainly for Microsoft-specific tasks like Outlook and Teams. Bain & Company data showed a ratio of roughly 8:1 ChatGPT seats to Copilot seats. When employees have a choice, they pick the product built by an AI-native company over the AI bolted onto legacy software.

Nadella himself knows this dilemma. In September 2025, he told employees at a town hall that he was haunted by DEC’s story. DEC’s engineers built microcomputer prototypes, but management refused to sell them at scale because doing so would cannibalize the high-margin minicomputer business. DEC was eventually acquired by Compaq and disappeared. Nadella’s words: some of the businesses we’ve built over 40 years may no longer be relevant.

But knowing and acting are very different things. Microsoft’s organizational response was not to restructure product teams. It was to promote four sales executives to EVP. A product problem, addressed with sales.

Lock 2: Organizational Architecture

Conway’s Law says that a system’s architecture mirrors the communication structure of the organization that built it. Google’s and Microsoft’s productivity teams were built over two decades for deterministic, human-in-the-loop software. Their testing infrastructure, release cadences, permission models, and quality metrics all assume humans are the primary actors. AI bolted onto this architecture produces a chat sidebar — because “user initiates request, system offers suggestion, user confirms” is exactly the interaction pattern these teams are best at building.

There is an even more specific layer of technical debt. In 2023, Google and Microsoft both faced the same question — how to integrate AI into productivity apps — and independently arrived at the same answer: a universal chat sidebar component, one UI adapted across Docs, Sheets, Slides, Drive, and Gmail. From an engineering reuse perspective this made perfect sense: one team covering all products. But the choice created path dependency. The chat sidebar’s entire technology stack — input box, conversation history, markdown rendering, copy-paste buttons — was designed for single-turn question-and-answer. Converting it to an agent requires redesigning the interaction model (how to show what the AI is doing), state management (what happens when the AI fails mid-operation), error handling (how to roll back), and compatibility with existing features (how does this coexist with comments, revision history, and collaborative editing). This is not adding a button; it is starting over. Claude Code was designed for agents from day one, and Cowork inherited that architecture, making it natively agentic. This is not Anthropic being smarter — it is Anthropic not carrying chat sidebar debt.

Google’s case is the most dramatic. Sergey Brin’s leaked memo contains a remarkable detail: Google maintained an approved-tools list for internal development, and Gemini was on the banned list for “weird historical reasons”. Brin has supervoting shares — he can theoretically decide anything at the company. But he said that changing this policy took “a shockingly long period of time.” He eventually went to Sundar Pichai: I can’t deal with these people, you need to deal with this.

This story matters beyond the humor. If a founder with controlling equity cannot quickly fix an obviously wrong internal policy, what is the coordination cost of getting security, legal, compliance, infrastructure, and product teams to collaborate on a feature that requires deep cross-team integration — like agentic document editing?

Google’s organizational difficulties are well documented. Sundar Pichai has 18 direct reports. Each product surface (Search, Workspace, Cloud, Android, YouTube) operates semi-autonomously with its own P&L. The Workspace team cares about productivity metrics and enterprise customer satisfaction. The Cloud team cares about competing with AWS and Azure. The Gemini team cares about model capabilities. Google has tried multiple reorganizations — merging Brain and DeepMind, moving the Gemini app under DeepMind, creating bridge PMs and AI councils — with limited success. The core tension reported by researchers is that research excellence and product velocity require opposite organizational cultures, and no single structure can optimize both.

Microsoft’s problem manifests differently but has the same root. The Copilot brand was applied to Windows, Edge, Bing, Microsoft 365, GitHub, Azure, Power Platform, and Dynamics 365. Each product team independently added Copilot features, creating an incoherent user experience. NoJitter tracked Copilot’s brand evolution: 2023 personal assistant, 2024 multiplayer AI, 2025 team collaboration, 2026 autonomous agents / Frontier Firm. A new positioning every year, none fully executed. This is not strategic indecision; it is the projection of a dozen product teams acting independently onto the brand layer.

By contrast, AI-native companies have the advantage of organizational simplicity. At OpenAI, research and product are not separated — the team that built GPT-4 also ships ChatGPT, and researchers see user feedback daily. This tight coupling is not a management technique; it is the natural state of a small organization. Google has been trying to replicate it through restructuring, but reorganizing a large organization usually changes reporting lines without changing how information actually flows. Conway’s Law teaches that if you cannot change how information flows, you cannot change the product’s architecture.

Lock 3: The Liability Vacuum

Enterprise resistance to agentic AI is not irrational. EY’s survey shows nearly 9 in 10 enterprise leaders identify roadblocks to agentic AI adoption. Deloitte found that only one in five enterprises has a mature governance model for autonomous agents. Icertis’ survey shows 56% of executives are “very concerned” about granting agents autonomy to make business decisions without guardrails. Gartner found that 83% of companies struggle to integrate Copilot into daily work due to weak information governance.

Behind these numbers is a concrete question: if Copilot autonomously modifies a contract’s liability clause, or changes a key assumption in a financial model, who is responsible when something goes wrong? The user? Microsoft? The model provider? Reuters’ April 2026 analysis noted that traditional legal frameworks assume humans maintain substantive control over a system’s actions. Agentic AI directly challenges this premise. There are no mature legal frameworks, insurance products, or contract templates for assigning liability for autonomous AI actions.

This creates an asymmetric risk structure. An agentic AI making a mistake — say, automatically modifying an important contract — could trigger massive lawsuits and reputational damage. But getting ten things right might go entirely unnoticed by the user. The downside far outweighs the upside. For companies serving hundreds of millions of users, conservatism is rational: have users confirm every step rather than letting AI act autonomously.

Startups can be more aggressive precisely because they serve narrower scenarios with more expert users. Harvey AI’s users are lawyers who review every document. Gamma’s users create internal presentations, not legal contracts. Agentic behavior is viable in these contexts because the consequences of errors are contained by user expertise and low-stakes scenarios. Microsoft and Google serve all scenarios across all industries and cannot assume that every user will carefully review AI output.

Moreover, the liability vacuum is not just an external constraint on big companies — it is also a legitimate weapon for internal resistance. Any team that prefers the status quo can invoke “compliance risk” and “user safety” to veto agentic features. This is how the liability vacuum and organizational architecture reinforce each other: things that can’t get pushed through the organization find a perfect justification.

The Three Locks Interlock

Revenue model conflict, organizational architecture, and the liability vacuum reinforce each other.

The revenue model means that investing in agentic features never becomes a priority — why spend heavily on something that might cannibalize your own revenue? Organizational architecture means that even when leadership recognizes the problem (Nadella’s DEC speech was clear enough), execution teams cannot coordinate the deep cross-team integration required. The liability vacuum gives everyone who resists change a reason that is both legally and morally defensible.

Together, these three locks determine the technical choices these companies make: RAG instead of agent architecture, no OOXML translation layer, no render-verify loop. These technical choices are not the result of insufficient engineering capability. They are the conservative options rationally selected under the constraints of the three locks.

This is why Nadella can cite DEC at a town hall to warn employees, while the organization’s actual response is to promote sales executives. He is not uninformed. The combined force of the three locks is simply too great.

Who Is Breaking the Equilibrium

Equilibria are not permanent. External forces are applying pressure from different directions.

One category bypasses the battlefield entirely. Gamma’s 70 million users demonstrate a simple fact: many people don’t need .pptx files. When a presentation becomes a scrollable web page rather than a slide deck, Office’s format lock-in becomes irrelevant. Notion’s AI Agent can autonomously edit block-based documents for 20 minutes. These tools never touch OOXML, so the format barrier doesn’t apply.

Another category does what big tech cannot, on big tech’s own turf. Harvey AI built the OOXML translation layer in months and achieved single-query editing of 100+ page contracts in Word. Claude Code treats .docx/.pptx as data structures, with render-and-verify loops to close the feedback cycle. These tools prove that the big companies don’t fail to build this because they can’t.

A third category consists of AI-native work platforms. Coda turns documents into programmable applications. These platforms don’t compete on format or features — they change what users expect “office software” to be.

These three forces do not yet constitute a fatal threat to Office and Google Workspace. Enterprise inertia and format lock-in remain powerful — a consulting firm cannot deliver a Gamma link instead of a PowerPoint deck. But the direction is clear.

A Framework for Judgment

Back to the original question: why, in 2026, can’t Google’s and Microsoft’s AI agentically edit documents in their own apps?

“Big companies are slow” is a correct observation but not a useful explanation. A useful explanation identifies specific locking mechanisms: the revenue model conflict makes agentic features commercially deprioritized; organizational architecture makes deep cross-team integration extremely difficult to execute; the liability vacuum makes conservative strategy the rational choice on legal and compliance grounds. These three locks are not anyone’s mistake — they are constraints naturally accumulated over two decades of success.

This framework applies beyond document editing. Any large company trying to build agentic AI into its core products can be tested against these three locks: does its revenue model permit AI to replace human work? Does its organizational architecture allow AI teams and product teams to deeply couple? Do its customer base and legal environment permit AI to act autonomously?

For those evaluating AI investments, a practical heuristic: don’t look at how strong a company’s AI model is. Look at whether its revenue model and organizational architecture allow that model to be fully unleashed. Model capability is a necessary condition, but far from sufficient. The reality of 2026 is that the strongest model is trapped inside the worst product, and the most profitable product is trapped inside the most conservative AI strategy.