AI AgentAI CodingDeveloper Tools

Pi: A Better AI Coding Tool, Locked Out

Published May 18, 2026

In April 2025, Mario Zechner started using Claude Code. Back then it was basic — Claude in a terminal, reading files, running commands, editing code. Mario liked the simplicity: four or five tools, a system prompt of a few thousand tokens, predictable behavior.

By late 2025, things had changed. Every Claude Code release tweaked the system prompt and tool definitions, breaking Mario’s workflows. The feature set ballooned to what he estimated as “80% of functionality I have no use for.” Sub-agents spawned as black boxes with no visibility into what they were doing. The terminal UI started flickering. He built cchistory to track the changes, and the more he tracked, the more frustrated he became.

In late November 2025, Mario published a blog post and a codebase — Pi, a coding agent built for his own use. System prompt under 1000 tokens. Four tools. Blog title: “What I learned building an opinionated and minimal coding agent.”

What’s Missing Matters More Than What’s Included

Pi’s design philosophy is best understood from its “NO list.” Mario laid out each feature he refused to add and why.

No MCP. The Playwright MCP server ships 21 tool definitions consuming 13,700 tokens — 7% of your context window before you start working. Mario’s alternative: CLI tools with README files. When the agent needs a capability, it reads the README and invokes the tool via bash. Tokens are consumed only when needed, not loaded on every session.

No sub-agents. Claude Code’s orchestrator spawns a sub-agent for complex subtasks, and you never see what it read, what it did, or where it made mistakes. Mario calls it a black box inside a black box. Pi’s alternative: start a new Pi instance via bash, optionally inside tmux for full observability. You see the sub-agent’s entire interaction and can intervene.

No plan mode. No special read-only analysis mode needed. The agent thinks through the problem with you and writes the plan to PLAN.md. You see which files it read and which it missed — something Claude Code’s plan mode doesn’t offer because its orchestrator spawns a sub-agent and you lose visibility.

No built-in to-dos. Built-in task lists confuse models. Pi writes TODO.md with checkboxes. State is visible and editable to both human and agent.

No permission popups. Mario considers permission dialogs on coding agents to be security theater. If the agent can read data, execute code, and access the network, game over. Accept YOLO mode or run in a container.

No background bash. Use tmux instead. You see the agent’s dev server output, can list all active sessions, and can jump into a session to co-debug. In early versions of Claude Code’s background bash, the agent forgot about running processes after context compaction with no way to query them — Mario found this messier than just using tmux.

No compaction. The most controversial item. Mario says he personally hasn’t needed it, that hundreds of exchanges fit comfortably in a single session. But other users — including Pi’s most vocal advocate, Armin Ronacher — have run into situations where context compression would help.

More Than Just “Less”

If Pi were merely a stripped-down Claude Code clone, it wouldn’t have attracted people like Armin Ronacher and Nader Dabit. Pi has three design features no other harness offers.

First, session trees. Pi’s sessions can branch, rewind, and jump between branches. If an agent breaks a tool while debugging, you fork to a new branch, fix the tool, return to the main branch, and Pi summarizes what happened on the side branch. This dramatically lowers the cost of experimentation — no fear of contaminating context while the agent explores.

Second, the extension system. Extensions are TypeScript code, hot-reloadable, readable and writable by the agent itself. No plugin marketplace — you tell the agent what you want, and it writes the code. Armin’s daily extensions are all agent-written: /answer (extracts and reformats questions from agent responses), /todos (manages markdown files under .pi/todos), /review (forks to a new branch for code review and brings fixes back), /files (lists files changed in the session). Mario maintains an agent-tools repo with CLI tools, each with a README the agent reads on demand — web search, image generation, YouTube search, arXiv paper retrieval. Ordinary CLI scripts, not MCP servers.

Third, RPC mode. Pi can embed as a subprocess in a larger automation system. Your orchestration layer sends tasks, Pi executes the coding work, returns structured output. Nader Dabit’s guide walks through building from bare pi-ai API up through pi-agent-core’s agent loop, pi-coding-agent’s session management, and pi-tui’s terminal interface — demonstrating how to build a production-grade agent from Pi’s stack. That’s how OpenClaw is built.

What It Feels Like to Use

Armin Ronacher wrote about switching to Pi in late January 2026. No flickering, no memory bloat, no random crashes. A slight flicker in VS Code’s built-in terminal, but better than Claude Code — Armin says he was so used to Claude Code’s flicker that “not having it feels wrong.”

The real difference is transparency. Armin compares Claude Code’s plan mode to Pi’s approach: Claude Code spawns a sub-agent for analysis, and you see the result but not the process — no way to know which source files it read and which it missed. In Pi, you see every read, every bash call, every edit. Plans live in files you edit together with the agent. “I need observability for planning and I don’t get that with Claude Code’s plan mode.”

Pawel Jozefiak ran a six-harness comparison in April 2026. His verdict on Pi was the most personal line in the entire piece: “I love Pi, but I can’t use it.” Fast, flexible, clean — but unusable. The reason had nothing to do with Pi itself.

Terminal-Bench 2.0 data tells the same story. Cline’s team ran Pi-Code, OpenCode, and Cline across four open-weight models in May 2026:

Model	Cline	Pi-Code	OpenCode
kimi-k2.6	55.1%	45.5%	37.1%
deepseek-v4-pro	53.9%	52.9%	51.7%
glm-5.1	49.4%	51.7%	52.8%
minimax-m2.7	42.9%	46.0%	39.3%

Pi beat Cline on two of four models (glm-5.1 and minimax-m2.7) and trailed by 1 point on deepseek-v4-pro. On open-weight models, Pi holds its own against any competitor.

What Pi Spawned

Pi isn’t just a coding agent. Its stack has become infrastructure.

OpenClaw is Peter Steinberger’s multi-channel AI agent. Users send messages through Telegram, WhatsApp, Slack, and OpenClaw executes in the background. When it went viral in late January 2026, Armin’s blog post was titled “Pi: The Minimal Agent Within OpenClaw” — OpenClaw’s engine is Pi.

mom is Mario’s own Slack bot built on Pi’s stack. Per-channel agent isolation, Docker sandboxing, scheduled tasks. Mario says pointing Pi at its own codebase and mom’s config lets the agent build a new one.

The extensions ecosystem grew fast through Armin, Nico, and others. Nico wrote pi-subagents, adding sub-agent capability through Pi’s extension API — Mario saying no doesn’t stop anyone. Someone wrote pi-interactive-shell for observable interactive CLI in a TUI overlay. pi-agentic-compaction adds context compression using a virtual filesystem. Tao of Mac began tracking the ecosystem: Graphone desktop client, pi-queue task runner, pi-token-burden token stats, pi-generative-ui, Mercury personal AI assistant. Someone forked the entire codebase into earendil-works/pi, accumulating 4,176 commits by May 2026.

The pattern across all of them: using Pi’s primitives — pi-ai’s LLM abstraction, pi-agent-core’s agent loop, pi-coding-agent’s session management — as a foundation layer. Pi’s core repo is maintained by Mario alone (he’s explicit about being a dictatorial maintainer), but the things built on top of it extend far beyond one person.

So Why Isn’t Pi Everywhere

The benchmarks show no gap. Terminal-Bench puts Pi on par with Cline and OpenCode across four open-weight models. Four tools and a sub-1000 token prompt don’t hurt performance. The extension system lets the agent fill in missing capabilities. Session trees, RPC mode, cross-provider switching — these are features other harnesses either don’t have or do less naturally.

But actual adoption is low. Pawel Jozefiak’s “I love Pi, but I can’t use it” captures the core barrier.

Pi is BYOM — bring your own API key. Claude users who want to try Pi need to pay API fees on top of their existing Max subscription ($100-200/month). Anthropic does not allow Max subscription credits to be used with third-party harnesses — the money you pay Anthropic can only be consumed inside Claude Code. When Pawel tested Pi, he received an explicit message: usage through Pi is billed through the API, not charged against the Max subscription.

This means Claude users effectively pay twice for the same model. Anthropic’s API pricing isn’t cheap — Opus at $25 per million output tokens, and long coding sessions burn through tens of thousands of tokens. On top of the $100-200 monthly subscription, the economics don’t work.

Google’s strategy is the same. Gemini CLI is free but locks you into Google’s harness; subscription credits can’t be used with third-party tools. Compare with OpenAI: API credits work with any compatible third-party harness. You use Pi with GPT-5, you pay API fees, no extra subscription required. The same dollar in Anthropic’s ecosystem can only be spent on Claude Code; in OpenAI’s ecosystem, it can be spent on any compatible tool — this difference determines whether an independent harness can survive in each model ecosystem.

This isn’t Pi’s fault. Pi’s cost model is clean on GPT and open-weight models — you pay for the model API calls you use. But Claude users make up a large portion of the AI coding tool user base. Pi’s best potential audience is locked out by Anthropic’s subscription strategy.

What Comes Next

Pi proved several things. Four tools and a sub-1000 token system prompt make a viable coding agent — the benchmarks and community feedback confirm it. Letting the agent extend itself through code works in practice, as Armin’s daily usage demonstrates. Keeping the core minimal and building with primitives is the right approach — OpenClaw, mom, and a dozen community projects built on Pi are the evidence.

What Pi hasn’t proven: whether an independent harness can survive against model companies’ ecosystem lock-in. Technical feasibility was never the question; business strategy is what decides. Mario doesn’t care — he built Pi for himself, user count is irrelevant. But if Pi’s design philosophy is right — if a good agent runtime doesn’t need a ten-thousand-token system prompt and thirty tools — then the philosophy needs a way to reach the people it would serve best.

The current path is a workaround. GPT and DeepSeek users can use Pi directly, with clear costs. Claude users either pay twice, wait for Anthropic to change its policy, or wait for the kind of hill-climbing that makes switching harnesses worth the extra cost, the way Cline SDK has demonstrated.

Mario ended his blog post with a line: if Pi doesn’t fit your needs, fork it. He meant it.

Pi repository: github.com/badlogic/pi-mono. Mario’s original blog: mariozechner.at. Armin’s experience: lucumr.pocoo.org/2026/1/31/pi. Pawel’s comparison: thoughts.jock.pl.