AI Products & PlatformsDeveloper Tools

The Decomposition Behind Claude Design: Reverse Engineering How an AI Designer Works, from an Open-Source Plugin

Since Claude Design launched, social media has filled with a familiar kind of post: someone shares a screenshot, says they spent five minutes getting Claude to generate an entire brand kit, a landing page, and a marketing one-pager, and adds, “Better than what I’d do in Figma over three days.”

These posts show the output. They don’t explain how it works. Anthropic didn’t go into technical detail in the launch post. Claude Design’s internal prompts are a black box.

But they did open-source a set of design workflow plugins on GitHub. The plugins don’t say “this is how Claude Design works internally,” but they share the same problem domain. Take them apart, and you can reverse-engineer how an AI designer is put together.

One design sprint, six people

Picture a design team mid-sprint. You walk in and you don’t see one person staring at Figma. You see six people doing different things.

One is doing design critique. She’s evaluating visual hierarchy, checking whether the interaction flow has unnecessary steps, comparing the current draft against the design system. Her output isn’t “it feels off.” It’s a severity-graded finding list: critical, major, minor, each with concrete recommendations.

Another is managing the design system. He’s auditing naming consistency across components, tracking token coverage, checking whether every component’s states and variants are documented.

A third is doing developer handoff. He’s extracting measurements, token references, responsive breakpoints, animation specs from the design file. He’s building component property tables, state interaction matrices, edge case checklists. Rule number one on his checklist: “Don’t assume anything. What you don’t write, the developer will guess.”

A fourth is writing UX copy. She’s facing a list of CTAs, error messages, empty states, confirmation dialogs. Three alternatives each, with tone annotations and localization notes.

Someone else is running an accessibility scan, checking every WCAG 2.1 AA criterion against color contrast, keyboard navigation order, and screen reader behavior. And another person is synthesizing the transcript from last week’s user interviews into themes, insights, opportunities, and user segments.

These aren’t six different people. This is Anthropic’s open-source Design Plugin. It decomposes design work into six activities, each with its own set of operating instructions for Claude.

Six parts, for a reason

You could cram design critique and research synthesis into a single prompt. It doesn’t work well. Design critique cares about visual hierarchy and usability consistency; its quality criteria are one thing. UX copy cares about clarity, tone, actionability; its quality criteria are completely different. Shove them into one prompt and the model can’t switch between standards.

A human designer doesn’t sketch mockups while running an accessibility audit checklist. These two activities demand different attention structures. Audit mode is checklist traversal, criterion by criterion. Critique mode is holistic scanning and judgment. Separating them isn’t for aesthetics. It’s so each sub-task has clean evaluation criteria.

Anthropic’s plugin structure reveals this judgment: six SKILL.md files, each defining its sub-task’s context, input, rubric, and output format. Claude auto-loads the right one for the right moment. Each instruction targets 1,500 to 2,000 words. Knowledge beyond that threshold is moved to a references folder and loaded only when needed. This is the engineering meaning of progressive disclosure: give the model only the judgment framework it needs right now, and don’t let irrelevant standards pollute its attention.

Taste isn’t about making the model “draw better”

If the Design Plugin handles design workflow, a separate plugin handles something harder: giving AI-generated frontend output actual taste. That’s the Frontend Design Plugin.

A vanilla LLM’s default frontend output already has aesthetics. The problem is that its aesthetics are the statistical average of its training distribution. It doesn’t need to learn what good design looks like. It just needs to learn what most websites look like: Inter font, purple gradients, centered hero sections, card layouts, rounded buttons, white backgrounds. The model isn’t bad at design. It’s too good at consensus design.

Anthropic didn’t optimize the model. They wrote a short prompt. Its core instruction: pick a bold conceptual direction first, then execute.

“Before coding, commit to a BOLD aesthetic direction: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric.” This isn’t a suggestion. It’s a constraint. “Never use Inter, Roboto, Arial, system fonts. No purple gradients on white backgrounds. Don’t converge on the same choices.”

A model’s instinct is to regress to the mean. This prompt tells it: pick an extreme, then push that extreme through five dimensions: typography, color, motion, composition, background detail. If you pick maximalist, your code better have complex animations, asymmetry, layered textures. If you pick refined minimalism, obsess over negative space, letter-spacing, and subtle transitions.

The prompt is 42 lines long. No external models. No API calls. It’s plain text injected into Claude’s frontend task context. But it does something most prompts can’t: before Claude even starts designing, it shoves the search space from “average UI” toward “UI with a concept.”

This reveals a deeper pattern. Giving more details (“button blue, 8px border-radius, 16px padding”) only nails the model to a specific average. Give a conceptual anchor (“this is brutalist minimalism”), then let every detail serve that concept, and the model has a reason to resist the mean. The conceptual anchor is where taste comes from.

Connected to information categories, not specific SaaS

The Design Plugin also gives each sub-task a connector. Its .mcp.json ships with Slack, Figma, Linear, Asana, Atlassian, Notion, and Intercom pre-configured. But the plugin’s instructions never tell any sub-task to “pull from Figma.” Instead, they use placeholders: Design Tool, User Feedback, Knowledge Base, Project Tracker, Product Analytics. In Anthropic’s plugin system, these placeholders (written as ~~design tool in the source) are resolved at runtime to whatever service the user has actually connected.

CONNECTORS.md lays out the abstraction clearly. Each sub-task needs information from a category:

Design critique needs Design Tool, User Feedback, and Knowledge Base. Design handoff needs Design Tool and Project Tracker. Research synthesis needs User Feedback, Product Analytics, and Knowledge Base.

Each category is a slot that accepts different concrete tools. The Design Tool slot can hold Figma, Sketch, Adobe XD, or Framer. Project Tracker can hold Linear, Asana, Jira, or Shortcut. User Feedback can hold Intercom, Productboard, or Dovetail.

This abstraction matters because it asks a different question. Not “do you have Figma,” but “can you get the raw structure and tokens of the design file.” Not “do you use Intercom,” but “do you have a channel for real user complaints.” Lift the dependency from specific products to information categories, and switching tools doesn’t break the workflow. You’re just swapping what fills the slot.

Claude Design is more than this plugin

Equating the Design Plugin with Claude Design the product would be wrong. Claude Design also has a canvas, interactive refinement mechanisms, an onboarding flow that auto-extracts brand design systems from a codebase, and a handoff bundle layer that packages designs for Claude Code to consume. Those are product experiences the plugin alone doesn’t explain.

But there’s a deeper point here.

What the Design Plugin gives Claude isn’t “deeper design knowledge.” Claude already knows what good design looks like. What the plugin gives it is the field’s evaluation framework: what makes a good critique, what makes a good audit, what counts as a complete handoff, what counts as clear UX copy. This isn’t capability injection. It’s criteria transfer.

This is evaluation-first thinking applied to AI tool design: don’t make the model smarter. Give it a framework for judging “what counts as good.” The model is a general reasoning engine. It doesn’t know how to critique a design well—not because it lacks intelligence, but because nobody gave it the operational structure of critique. The plugin gives it that structure: first impression, then usability, then visual hierarchy, then consistency, then accessibility. Each finding gets a severity and a recommendation.

The same decomposition pattern shows up in Anthropic’s other open-source plugins. The engineering plugin breaks code review into bug detection, project guideline compliance, and code quality. The legal plugin breaks contract review into NDA triage, compliance checks, and vendor risk assessment. Browse the GitHub repo and you’ll find the same architecture: first define “how a good contract reviewer actually thinks when they read a contract,” then let AI do it.

Anthropic now has three design-related outputs: Claude Design the product handles the text-to-prototype canvas. The Design Plugin handles design workflow capabilities. The Frontend Design Plugin handles aesthetic injection into code output. Three layers answering the same question at three levels: the ceiling of design capability isn’t set by using a stronger model for one better generation. It’s set by whether you can decompose design judgment into manageable cognitive units and allocate each to the right moment with the right context.

Skill-type products face a natural commoditization risk: once the workflow and evaluation criteria are standardized and published, the differentiation window starts closing. Anthropic chose to open-source these plugins, which suggests they believe the real value isn’t in the prompts themselves. It’s in who can turn evaluation frameworks into products—into distributable, customizable, context-connected systems for “knowing how to judge.” That’s probably the real reason Claude Design feels different. The model didn’t change. The organization of the work did.