AI AgentAI Products & Platforms

Fable 5 Is Expensive, but Anthropic Published the Cost-Saving Answer Two Months Ago

Today Anthropic released Claude Fable 5, the public version of Mythos — the model that quietly rocked Wall Street two months ago during its private rollout. Pricing is $10 per million input tokens and $50 per million output tokens, twice the cost of Opus 4.8 and more than three times Sonnet 4.6. Subscription users have a window: Pro, Max, Team, and Enterprise plans include Fable 5 at no extra cost from today through June 22, after which it switches to usage credits; on the API, it’s pay-per-token from day one. In other words, two weeks from now, “should I use the best model in my agent pipeline” becomes a question with a real price tag attached for the first time.

Anthropic published its own answer to that question two months ago — and almost no one noticed. On April 9, 2026, they shipped the advisor tool: let a cheaper model handle the work end-to-end, and make the most expensive model an on-demand consultant you pay by the call. Coincidentally, that same day I published a piece about AgentOpt, which covered a striking experimental result from the AgentOpt paper: in a planner-solver pipeline, putting Claude Opus in the planner slot ranked near the bottom across 81 model combinations; switching to Ministral 8B as planner and Opus as solver more than doubled the accuracy.

One paper and one API feature, released the same day, pointing in apparently opposite directions, built on the same underlying judgment. With Fable 5 putting a price tag on the question, that judgment matters more today than it did two months ago.

An Opus That Only Gives Advice

The mechanism of the advisor tool can be explained in one sentence. The executor (Haiku 4.5 or Sonnet 4.6) runs the task end-to-end: calling tools, reading results, iterating. When it hits a decision it’s uncertain about, it calls the advisor (Opus 4.7 or 4.8). The advisor reads the full conversation history and returns a chunk of guidance — typically 400 to 700 tokens — containing a plan, a correction, or a stop signal. The executor takes that advice and keeps going. This entire loop happens inside a single /v1/messages request on Anthropic’s servers; the client writes no orchestration logic at all.

The real design insight is in what the advisor is not allowed to do. It has no tools. It produces no user-facing output. Even its own thinking is discarded by the server before the response returns. The only thing it can do is put its judgment into text and hand it back to the executor. Throughout the entire process, the steering wheel stays with the cheaper model.

Anthropic explicitly notes in their official blog post that this design inverts the common sub-agent pattern. In conventional multi-model architectures, a large model acts as orchestrator — decomposing tasks at the top level and handing work to a pool of smaller models. In the advisor pattern there’s no task decomposition, no worker pool, no orchestration logic. The small model drives itself forward and escalates only when it needs to.

A Seeming Contradiction, a Shared Judgment

Put these two things side by side and the first impression is conflict. AgentOpt concludes that weak models should be planners and strong models should be solvers. The advisor tool reverses this: strong model produces the plan, weak model executes. Two contradictory conclusions — how can they be the same judgment?

The answer is in AgentOpt’s failure mechanism. Opus failed as a planner not because it couldn’t plan. It didn’t stay in its role: in seven of nine experiments, it bypassed the downstream solver entirely and answered the question directly, producing a mediocre direct answer that broke the pipeline’s entire reasoning chain. Ministral 8B succeeded because it knew it couldn’t answer directly — so it dutifully decomposed the task, called tools, and passed sub-problems downstream.

The real variable wasn’t which model was better at planning. It was whether the model holding the control loop would stay in bounds. The advisor tool addresses this risk at the source: Opus’s intelligence is still there, but its tools are taken away, its control is taken away, and its output can only re-enter the loop as advisory text. Even if it wanted to overstep, it has no mechanism to do so. AgentOpt exposed the problem through experiment; the advisor tool answered it through permission design.

Both share the same principle: give control to the model that stays in role, make intelligence an on-demand resource. Whether a model’s capability is an advantage depends on where you put it.

Evidence

Anthropic’s own benchmarks provide three sets of numbers. First: Sonnet 4.6 with an Opus advisor scores 2.7 percentage points higher than Sonnet solo on SWE-bench Multilingual (72.1% vs 74.8%), while per-task cost actually falls 11.9%. The cost drops because better planning reduces trial-and-error rounds — the extra spend on the advisor is recovered by fewer executor iterations. Second: the same pairing scores higher on BrowseComp and Terminal-Bench 2.0, with per-task cost still below Sonnet solo. Third: Haiku 4.5 with an Opus advisor takes BrowseComp from 19.7% to 41.2%; compared to Sonnet solo, the score is 29% lower but the cost is 85% lower — the right profile for high-volume, cost-sensitive workloads.

One caveat: all of these numbers come from Anthropic’s own evaluations. There is no independent replication at the time of writing. Three launch customers — Bolt, Genspark, and Eve — provided testimonials on release day; Genspark mentioned the advisor outperformed their own custom planning tool, which carries somewhat more signal than a typical customer quote. The directional signal is consistent, but until you run an eval on your own workload, treat these numbers as reference points, not guarantees. Anthropic themselves recommend testing all three configurations: Sonnet solo, Sonnet with advisor, and Opus solo.

The Pattern Is the Community’s; the Primitive Is Anthropic’s

Weak model executes, strong model advises — this pattern has been in the community for a while. Aider launched an architect/editor two-model mode in September 2024: strong model produces the plan, cheap model generates the actual file edits — and claimed the best benchmark score on their own leaderboard at the time. Sourcegraph Amp’s oracle tool is structurally almost identical to the advisor tool: the main agent autonomously decides when to consult a stronger model on debugging and planning. Cursor’s Plan mode belongs to the same family. Anthropic openly acknowledges in their blog that developers had already converged on this strategy, and what they’re doing is turning it into a single line of configuration in the API.

What only Anthropic has done, though, is make it a server-side primitive. OpenAI’s GPT-5 router only exists inside the ChatGPT product, isn’t open as an API, and operates at the request level: the whole request goes to one model, there’s no mid-generation consultation. The Agents SDK’s agents-as-tools feature is conceptually close, but it runs in the user’s process — each consultation is a separate API request, and context is assembled by the client. On Google’s side, Vertex AI’s Model Optimizer picks a model per request; Deep Think is single-model internal reasoning depth. Neither has a mechanism for a second model to intervene mid-execution. There’s also an indirect indicator: when LiteLLM adapted to support the advisor tool, it had to simulate the entire orchestration loop in the gateway layer for non-Anthropic models. If another provider had a native equivalent, that workaround wouldn’t be necessary.

How to Use It Today

On the API, it’s genuinely one line of configuration. The documentation labels it beta and says to contact your account team, but in practice adding the beta header is all it takes: LiteLLM and Vercel AI SDK have both already adapted, multiple independent developers have posted working code, and no one has mentioned an approval step.

response = client.beta.messages.create(
    model="claude-sonnet-4-6",            # executor
    betas=["advisor-tool-2026-03-01"],
    max_tokens=4096,
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-8",   # advisor
            "max_uses": 3,                # max consultations per request
        },
        # ...your other tools
    ],
    messages=[...],
)

Model pairing has constraints: the advisor must be at least as capable as the executor. For cheaper executors like Haiku 4.5 and Sonnet 4.6, the advisor options are Opus 4.7 or 4.8. On the billing side, advisor tokens are charged at the advisor model’s rate and appear as a separate line in your usage block; max_uses is the primary cost control lever. In Claude Code, typing /advisor turns it on directly.

What about Fable 5? It entered the compatibility table on the day it launched, but only in one slot: when the executor is Fable 5, the advisor can be Fable 5 — a self-pair. The combination you actually want, Sonnet or Haiku doing the work with Fable 5 as advisor, isn’t available yet. But there’s a precedent: after Opus 4.8 launched in May, it quickly appeared as an advisor option for cheaper executors. More importantly, Fable 5 is priced exactly for the advisor role. At $50 per million output tokens, running it end-to-end in an agent is difficult to justify on most workloads; but as an advisor that produces a few hundred tokens of guidance per call, high unit price multiplied by small volume lands in a comfortable range. Before that cross-tier pairing opens up, subscription users have one more tool: until June 22, Fable 5 is free in Claude Code — two weeks to get a feel for its ceiling. For the long run, if you want to keep intelligence at that level in your agent pipeline without being bankrupted by the bill, the advisor pattern is the structure.

Caveats

Four things worth saying upfront. First, the Claude Code integration currently has the most friction. The advisor sub-reasoning tokens get double-counted into the main context — an issue that’s been open for five weeks without a fix — and the CLI refuses to use Haiku as an executor, contradicting the API documentation. It also breaks through a LiteLLM proxy. If you’re evaluating this feature seriously, use direct API calls. Second, the platform scope is limited: only the Claude API and Claude Platform on AWS support it; AWS Bedrock, Vertex AI, and Microsoft Foundry do not. Third, there are clear cases where it doesn’t apply: single-turn Q&A with nothing to plan; products where users choose their own model (adding an advisor layer just complicates the cost model); tasks where every step genuinely requires frontier capability, in which case Opus directly is better. Fourth, the advisor’s own prompt caching is a separate toggle; community experience suggests the rule of thumb is that caching pays for itself after three or more advisor calls per conversation — worth enabling for long loops, not short tasks.

Nobody Applauded, but Everyone Is Integrating

The official blog post got 6 upvotes and 1 comment on Hacker News on release day. Within the same week, LiteLLM and Vercel AI SDK completed their integrations, and the OpenCode community opened a feature request. Those two facts together are a fairly precise signal: developer infrastructure is rapidly digesting the feature, public attention hasn’t reached it yet, and everyone is still watching for the next model release.

The mismatch in attention is revealing on its own. For the past two years, “which model to use” has been roughly equivalent to “use the strongest model,” and that equivalence held when tasks were handled by a single model. Fable 5’s pricing puts a price tag on that equivalence: keep using the strongest model as the default and pay $50 per million output tokens; put it in the right role and pay per consultation. The AgentOpt paper argued that model quality is a function of role and pipeline, not something you can evaluate in isolation. Two months ago that was a data point from a controlled experiment. Now it’s a parameter in the API, and the next step is most likely Fable 5 appearing in the advisor slot for cheaper executors. If your agent is running on Sonnet today, run all three configurations — Sonnet solo, Sonnet with advisor, Opus solo — and spend half a day figuring out whether this judgment holds on your own workload. While you’re at it, use the two weeks before June 22 to feel out Fable 5’s ceiling in Claude Code. After that, you’ll know which role you’re willing to pay for it in.