AI AgentDeveloper Tools

Vercel open-sources eve: why "an agent is a directory" is not a throwaway line

Published Jun 18, 2026

On June 17, 2026, Vercel open-sourced eve under Apache 2.0 at its Ship conference in London — repo at github.com/vercel/eve. The official blog post went with a single sentence as the headline: An agent is a directory. An agent is a directory. Simple as that.

The confusion hits on first read. Making a folder — anyone can do that. If the ultimate answer to agent frameworks is just a directory structure, then what exactly has the entire industry been arguing about for the past few years?

The confusion isn’t unfounded. That sentence collides head-on with our default assumptions about what an agent framework should be. Over the past two years, building agents came down to two paths. One is the LangChain way: stitch together Tool, Chain, and Memory in Python, handle registration and deployment yourself. The other is the Claude way: configure a few connectors on Anthropic’s cloud and let Claude reach into your Slack and databases. eve says both paths got it wrong. eve treats the agent as standalone software: the filesystem serves as the definition, Git handles versioning, one command deploys it to production.

Behind the three approaches lies the same question, answered three different ways: what even is an agent? LangChain treats it as a programming problem — the answer is, you assemble the parts yourself. Anthropic treats it as an extension of the model — the answer is, give Claude a few doors to open. eve treats it as standalone software — the answer is a directory. This isn’t a difference in feature count. It’s a fork in the road. Follow one path or another, and the question of whether an agent graduates from demo to 24/7 production service ends in entirely different places.

LangChain: assemble the parts yourself

Most people doing agent development run into LangChain first. LangChain gives you a set of programming abstractions: Agent, Tool, Chain, Memory, Retriever. You stitch these components together in Python, write input/output schemas for each tool, write registration code to attach them to the agent, manage session state yourself, wire up a vector store for RAG, and at deployment time you set up an API server and configure monitoring on your own. Maximum freedom. The cost is you have to build every piece yourself.

This is a lot like buying a cabinet from IKEA. The boards and screws are all laid out in front of you. In theory you can assemble them into any shape. But assembly is only step one — after that comes painting, mounting, attaching the handles. LangChain’s responsibility ends at how the parts connect. How your agent files are organized, how it gets deployed, whether an in-flight session breaks during a deploy, whether it burns compute while waiting for approval — none of this is LangChain’s concern. The New Stack’s coverage calls LangChain’s LangGraph the most mature agent framework, but its durable execution is decoupled from deployment, operations, and version management. You have to stitch them together yourself.

The upside of the IKEA model is freedom of shape. The downside is you have to be the carpenter yourself. Most teams don’t have that kind of bandwidth. In the end the agent stays at the demo stage — the step to production is one nobody ever finishes walking.

Claude Managed Agents: give the model a few doors to open

Anthropic took the opposite path. Claude Managed Agents follows a simple logic: Claude is the center, and the agent is a set of pipes through which Claude reaches into external systems. You configure a connector, and Claude can read your Slack, query your database. The agent runs entirely on Anthropic’s cloud and is bound exclusively to the Claude model. Data, invocations, execution context — all of it lives on Anthropic’s servers.

Anthropic’s sub-agents documentation and skills documentation spell out this logic clearly: connectors interface with MCP servers (hosted on Anthropic’s cloud), and custom connectors are how you bring your own MCP server into Anthropic’s cloud. What you’re configuring doesn’t feel like a new employee capable of independent action. It’s more like defining what Claude is allowed to touch.

The upside of this path is out-of-the-box experience, zero ops. The cost stacks up across three layers. First, the agent isn’t software you own — it’s a configuration entry on the cloud. Change one line of instructions and there’s no diff to review, no version history to trace, no preview deploy to canary-test. Second, you’re locked into Claude as the model. You wait for Anthropic to ship upgrades, with no room to make your own model selection decisions. Third, all your data sits on Anthropic’s servers — scenarios with strict compliance requirements are a non-starter.

The third way: build the agent as standalone software

Let’s return to that opening line. When eve says an agent is a directory, it’s not claiming that using folders to organize code is some novel thing. It’s actually saying three things, and each one targets a gap left by LangChain and Claude Managed Agents.

First, file names are definitions — no registration step required. In eve, the filename agent/tools/run_sql.ts is the tool name. The framework auto-discovers and auto-registers at build time. agent/skills/revenue-rules.md automatically becomes on-demand context — it only surfaces when the model touches on revenue topics. You don’t write a single line of glue code to attach a tool to the agent. eve’s official documentation lays this out plainly: eve is filesystem-first. A file’s location determines its role. There is no separate registry that needs to be manually kept in sync. Adding a tool is adding a file. Deleting a tool is deleting a file. Where LangChain has you writing registration code, eve lets the filesystem speak for itself.

Second, the agent is software you truly own. An eve agent is a Git repository. Changing one line of instructions is a commit — there’s a diff to review, history to trace. Every commit automatically generates a preview deploy. eve eval plugs into CI as a deploy gate. If something breaks, you roll back in seconds. Managing an agent is identical to managing a regular web application. Claude Managed Agents, by contrast, are configuration entries on the cloud — not code you can commit, diff, and roll back at will. Trace the divergence one layer deeper: Anthropic sees the model as the platform and the agent as an extension of the model; Vercel sees the model as a commodity — one line of code through AI Gateway swaps between Claude, GPT, and Gemini — and the agent as standalone software. Follow these two assumptions downstream, and the production lifecycle of an agent ends up looking entirely different.

Third, the production runtime is bundled into the same directory. eve packs three capabilities into the box. Durable execution: every step is automatically checkpointed; deploys never interrupt in-flight sessions; waiting for approval consumes zero compute; the Workflow SDK guarantees that runs resume from the last good step, not from zero. Sandbox: code written by the agent runs inside an isolated microVM with its own kernel and filesystem, fully separated from the host. Channels: eve channels add slack — one command and the agent appears in Slack. Approvals render as Slack buttons, questions become select menus, and you’ll even see a typing indicator when the agent is composing a response. The same agent can be present on Slack, Discord, Teams, and HTTP simultaneously, and sessions migrate freely between these channels: a question on Slack can switch to the web to continue the conversation, an HTTP webhook can automatically open an investigation thread on Slack. These three capabilities? LangChain doesn’t bundle them. Claude Managed Agents doesn’t offer them at anywhere near this depth.

eve is like a fully furnished kitchen. The stove, fridge, and dishwasher are all in place. You just decide what groceries to put in the fridge. The cost is you can’t change the kitchen layout (the directory conventions are non-negotiable), and by default this kitchen is built inside the Vercel building. Official promises of multi-platform support are “on the way” with no timeline yet. But compared to IKEA flat-pack boards and giving Claude a few doors to open, this is already the shortest path to production.

The kitchen is ready. Now, where do the ingredients come from?

eve solves how an agent runs, how it deploys, how it connects to Slack, how it stays alive. It does not solve what the agent thinks or whether its outputs carry judgment. This is the easiest place for misreading to happen.

eve’s instructions.md and skills/ are loading slots, not content sources. The official blog is explicit: That leaves the part no framework can write for you: what your agent actually does. The fully furnished kitchen has its stove, fridge, and dishwasher all set up, but you still buy the ingredients, do the chopping, handle the seasoning. Whether your agent outputs a checklist or a judgment-driven analysis depends entirely on what you put into instructions.md and skills/. eve itself has nothing to do with this. eve’s default instructions.md example is a single line: You are a concise assistant.

And this is exactly what I’ve spent the past year working on. I call this system context infrastructure: reverse-distilling judgment principles from call transcripts, WeChat conversations, AI chat histories, and every correction made along the way — accumulating them into a cognitive context that loads on demand. eve’s instructions.md defaults to a one-line example. My SOUL.md is 73 lines of behavioral contract backed by over fifty cross-scenario-validated judgment principles. Slot the latter into the former’s loading position, and what you get is an agent with both production-grade runtime capability and the texture of personal cognition.

eve alone can run, but its outputs will likely stay at the consensus level: correct but mediocre — the kind of answer anyone asking an AI would get. Context infrastructure alone has depth, but no production runtime: it can’t connect to Slack on its own, has no durable execution, can’t deploy with one command. Put the two together and you have a complete picture. eve handles the runtime. Context infrastructure handles the cognitive content. They operate at different abstraction layers and do not compete with each other.

The division of labor in the age of agents is coming into focus. Anthropic owns the protocol: how models understand the world, how they operate tools — MCP and the skills format are already de facto standards. Vercel owns the runtime: how agents run, how they deploy, how they survive tomorrow. Each person’s own context infrastructure owns the cognitive content: what the agent thinks, what it judges, what it outputs. Three layers, each in its own lane. No one needs to replace anyone else.