产业与竞争开发工具

AI-Driven UI Design Workflow: Cost Structure Analysis and Competitive Landscape

Survey session, 2026-04-21

Background and Problem

GPT-Image-2’s UI mockup generation capabilities have triggered extensive discussion on social media, but a single static image has limited commercial value on its own. The real question is: which cost structures within UI design is AI changing? Which stages have already been transformed, and which remain untouched? And among the dozen-plus products on the market, which segment is each one targeting?

This article starts from concrete scenarios in the design workflow, distills three underlying mechanisms that drive costs, and then uses those three mechanisms as a coordinate system to understand the approaches and competitive dynamics of AI tools.


Chapter 1: Why Design Workflows Are Expensive — Three Underlying Mechanisms

Starting from a Few Scenarios

Scenario 1: A founder tells a designer “I want a more premium feel, something like tessellate2d.com.” The designer opens the reference site, studies it, feels they understand, and spends three days producing a high-fidelity design. The founder looks at it and says “That’s not what I meant. I wanted that dark-background tech aesthetic, not this flat corporate look.” Three days wasted, back to square one. The problem is not the designer’s skill. The problem is that the three words “premium feel” carry too little information. The founder has a specific image in mind, but language cannot precisely convey that image — it can only convey a vague direction. The designer builds a concrete implementation based on that vague direction, and the gap between the two only surfaces after the implementation is complete.

Scenario 2: A designer spends two days in Figma building a high-fidelity landing page design. The overall direction gets approved. But during the review, the product manager points at the hero section and says “I want a bottom-to-top fade-in animation here, and the numbers need a counting scroll effect.” The designer cannot demonstrate this in Figma and can only add text annotations: “scroll-triggered fade-in, counter animation.” Two weeks later, the front-end developer implements it, but the animation speed, easing curve, and trigger timing all differ from what the product manager envisioned. Another round of revisions. The issue: Figma can only express static visuals. There is no low-cost way to express dynamic behavior — you have to wait until the code is written to see the actual result.

Scenario 3: The high-fidelity design passes review and enters development. The design includes a set of glassmorphism cards (translucent background + backdrop-blur), simulated in Figma using blur filters. The front-end developer needs to translate this visual effect into CSS: background: rgba(255,255,255,0.04); backdrop-filter: blur(20px); border: 1px solid rgba(255,255,255,0.08). Figma describes “what this thing looks like”; code describes “how to make the browser render this thing.” The translation between these two languages requires manual work from the developer, repeated for every component.

Scenario 4: After the site goes live, the founder looks at the actual result and says “The overall mood isn’t quite right. I think it should be darker, and the text hierarchy needs more contrast.” This is a directional adjustment, but at the code level it means changing dozens of color values and adjusting spacing and font sizes in over a dozen places. If this feedback had occurred at the moodboard stage, changing direction would have meant swapping a few reference images. The same feedback at the code stage costs two orders of magnitude more.

Three Underlying Mechanisms

These scenarios look different on the surface, but share common drivers. The reason UI design workflows are expensive, slow, and prone to rework comes down to three mechanisms.

Mechanism A: The manual nature of format conversion.

The design process is a path from vague to precise: mental image -> text description -> moodboard -> wireframe -> high-fidelity design -> interactive prototype -> production code. Each step is a format conversion, and each conversion requires a specific person using a specific tool to perform a manual translation. A moodboard does not automatically become a Figma file; a Figma file does not automatically become code. Scenario 3 is the classic manifestation: the designer described an effect in Figma’s visual language, and the developer had to manually translate it into code. Each translation introduces delay (waiting for someone’s availability), loss (information lost in translation), and deviation (two people interpreting the same visual effect differently).

Mechanism B: The inverse relationship between fidelity and modifiability.

The more precisely something is expressed, the more expensive it is to change. The sentence “I want a dark tech aesthetic” can be instantly changed to “I want a warm academic feel,” but a Figma design that took three days requires another two days for the same adjustment, and a live website may need a week. Scenarios 1 and 4 both reflect this mechanism: design intent can only be validated at the high-fidelity stage (because you can only judge whether the “feel” is right when you see the concrete visual), but high-fidelity artifacts are expensive to change. This creates a dilemma: without high fidelity you cannot confirm direction, but with high fidelity you face the risk of directional rework.

This mechanism is the most fundamental tension in the entire design workflow. The traditional approach manages this risk through phased delivery (confirm direction at low fidelity first, then refine at high fidelity), but the cost is a longer process with translation overhead at every stage boundary (back to Mechanism A).

Mechanism C: Bandwidth limits of cross-medium communication.

Designers, product managers, founders, and developers need to convey design intent to each other, but every communication medium has a bandwidth ceiling. Language can convey intent and direction but not specific visual effects (Scenario 1: “premium feel” means entirely different visuals to different people). Static images can convey visuals but not dynamic behavior (Scenario 2: Figma cannot express scroll animation timing and easing). Interactive prototypes can convey behavior, but their production cost is high enough that most teams only build them at critical junctures. No single medium simultaneously achieves low cost, high fidelity, and complete expressiveness.

Scenario 2 simultaneously reflects Mechanism B and Mechanism C: the scroll animation the product manager wanted could not be expressed in Figma (Mechanism C’s bandwidth limit), could only be validated after code was written (Mechanism B’s high-fidelity requirement for confirmation), and code-stage modifications are expensive (Mechanism B’s modification cost).

How the Mechanisms Interrelate: The Cost Trap

The three mechanisms are not independent. They form an interlocking triangle: addressing any one mechanism shifts pressure onto the other two.

Interlocking triangle of Mechanisms A/B/C

Specifically:

You try to solve Mechanism C (cannot articulate intent clearly) -> you need a high-fidelity medium to express it -> this triggers Mechanism B (high-fidelity artifacts are expensive to change). In Scenario 1, if the founder could produce a precise visual reference instead of saying “premium feel,” the communication problem would be solved. But producing that image takes time, and once produced, changing direction is expensive.

You try to solve Mechanism B (make high-fidelity artifacts modifiable) -> you either produce multiple versions for parallel exploration -> this triggers Mechanism A (each version requires a full round of manual conversion, with costs scaling linearly). In Scenario 4, if you wanted to try three color directions simultaneously and pick the best one, you would need to run the full design-to-code pipeline three separate times.

You try to solve Mechanism A (automate conversion, skip intermediate formats and go straight to code) -> code is the hardest medium for communicating modification intent -> this triggers Mechanism C (how do you explain “the scroll effect I want” to a developer?). In Scenario 2, even if AI could generate code directly and skip Figma, the animation effect the product manager wanted could still only be validated after the code was running.

The three mechanisms reinforce each other, trapping design teams in a cost loop: vast amounts of time spent on translation, waiting, and rework, with a low proportion of time devoted to actual exploration and creation. Understanding the structure of this trap makes it possible to see why AI tools have each chosen different entry points, and why no single tool can solve all three problems simultaneously.


Chapter 2: AI Progress Across the Three Mechanisms

Mechanism A: Format Conversion — Substantially Weakened

This is the mechanism where AI has made the most visible progress. Two paths are advancing in parallel.

The first path is automated translation: preserving the existing format ladder, but automating the translation between its rungs. Builder.io / Visual Copilot’s component mapping mechanism (mapping Figma components to real components that already exist in the codebase) and Figma MCP Server (letting Claude Code directly read color values, spacing, and component structures from Figma design files) have reduced the Figma-to-code translation workload by 30-60%. Figma’s own Make feature is doing the same thing. For teams with mature design systems, translation overhead has been downgraded from a primary bottleneck to a secondary issue.

The second path is more radical: skipping intermediate formats entirely. Vercel v0, Bolt.new, Lovable generate deployable code directly from natural language, completely bypassing the Figma design and interactive prototype stages. Claude Code can similarly generate front-end code directly from text descriptions (our QuackTech Innovation website — 1,530 lines of HTML/CSS/JS with glassmorphism, particle animations, and scroll animations — was generated by Claude Code in a single pass). For showcase-type websites, this path is already production-ready.

Where the limits are: Automated translation depends on how well-organized the Figma file is; teams without a design system benefit less. The skip-intermediate-formats path works well for showcase pages (Bolt.new’s landing page success rate is 92%), but success rates drop significantly for complex applications (down to 31% for complex SaaS), because the information content of complex applications exceeds what natural language can precisely convey.

Mechanism B: The Fidelity-Modifiability Inverse Relationship — Indirectly Mitigated

No AI tool has directly solved this mechanism (it is close to a physics-level constraint), but there are two forms of indirect mitigation.

The first is making high fidelity cheap. Google Stitch’s infinite canvas can generate five high-fidelity UI directions from a single prompt, driving the marginal cost of exploration toward zero. v0 and Bolt.new can generate a high-fidelity page within minutes. When the production cost of high fidelity drops low enough, making three directions in parallel no longer requires triple the time, and the fidelity-flexibility dilemma loosens.

The second is creating a new fidelity position. GPT-Image-2, through ChatGPT’s multi-turn conversation, can generate visually high-fidelity UI mockup images while maintaining extremely low modification cost (just say “the blue is too bright” in conversation). This creates a position on the traditional spectrum that did not previously exist: high visual fidelity, low modification cost, zero executability. Design intent can iterate repeatedly at this position until the direction is confirmed, then jump in one step to code implementation.

Where the limits are: Lower production cost does not equal lower modification cost. When the direction of an AI-generated high-fidelity page is wrong, the most economical approach is often to discard and regenerate rather than modify the existing version, because AI is currently better at generating from scratch than at making deep local modifications while maintaining overall consistency. For projects that have already accumulated substantial business logic, starting over is not an option. What AI needs there is the ability to keep 90% unchanged and precisely modify 10% — and this is harder at the code level than at the image level.

Mechanism C: Bandwidth Limits of Cross-Medium Communication — Largely Untouched

This is the mechanism where AI has made the least progress among the three, and therefore may be the next high-value entry point.

Two sub-problems are especially prominent under this mechanism.

The first is expressing dynamic behavior. Scroll animations, page transitions, hover microinteractions, and loading skeletons have no low-cost expression method anywhere in the design pipeline. Designers in Figma can only use static frames to hint “there should be a fade-in animation here”; product managers can only describe expectations as “I want that silky-smooth feeling.” The actual effect can only be seen after the code is written. Framer AI’s animation support is the best effort in this direction, but its scope is strictly limited to websites, with a limited ceiling on animation complexity. Claude Code can generate animation code, but the prerequisite is that you can precisely describe the desired effect in language — and most people cannot.

The second is translating vague feedback into precise modifications. A large share of feedback in design reviews is at the feeling level (“it’s too dense here,” “it doesn’t breathe enough,” “the overall mood isn’t quite right”), but modifications need to be executed at the pixel or code level (spacing from 16px to 24px, background color from #0a0e1a to #0d1225). This translation from vague to precise depends entirely on the designer’s comprehension and experience. AI tools have almost no presence here.

Why this is hard: The difficulty of Mechanism C lies in its interlocking with Mechanism B. Dynamic behavior is hard to communicate not only because there is no expression medium (Mechanism C) but also because the only medium that can fully express dynamic behavior (runnable code) is expensive to modify (Mechanism B). Breaking this interlock requires a new medium that is both low-cost and capable of conveying dynamic behavior. No such medium has appeared yet.

Summary

Mechanism AI Progress Representative Products
A. Manual nature of format conversion Substantially weakened Builder.io (automated translation), v0/Bolt.new/Lovable (skipping intermediate formats), Claude Code
B. Fidelity-modifiability inverse relationship Indirectly mitigated Google Stitch (making high fidelity cheap), GPT-Image-2 (creating a new fidelity position)
C. Cross-medium communication bandwidth Largely untouched Framer AI (limited animation support)

In one sentence: as of April 2026, AI has primarily solved efficiency problems on the production side (faster execution, cheaper translation), while communication-side problems (how to articulate what you want, how to turn vague feedback into precise modifications) have seen limited progress.


Chapter 3: Competitive Landscape — Three Bets on the Future

The dozen-plus AI design tools on the market appear to be doing their own separate things, but viewed through the triangle framework above, they are all answering the same question: how will the design workflow evolve? Each company’s answer differs, and these answers can be grouped into three bets.

Bet 1: Designers remain central, AI serves as an accelerator

Figma AI and Builder.io represent this direction. Their shared assumption is that the core value of design lies in taste judgment and systems thinking — capabilities that AI cannot replicate in the near term. AI’s role is to take over execution labor (building layouts, translating to code), freeing designers to focus on what AI cannot do.

This bet’s position in the triangle is clear: it primarily attacks Mechanism A (automated translation). Figma’s First Draft compresses the time from blank canvas to initial draft from hours to minutes; Make automates the design-to-code translation. Builder.io’s Visual Copilot is more precise: its component mapping mechanism maps Figma components to real components already in the codebase, rather than regenerating from scratch. Fusion 1.0 can even generate a PR directly from a Slack/Jira ticket.

The implicit cost of this bet: it does not change the structure of the triangle; it only shortens certain edges. Designers still need to produce high-fidelity designs in Figma (Mechanism B unchanged), and teams still communicate intent through static design files (Mechanism C unchanged). If this bet is correct, Figma’s moat deepens further. If AI eventually develops sufficient taste judgment, this path gets bypassed.

Bet 2: The design stage can be skipped entirely

Vercel v0 (6 million developers, $42M ARR), Bolt.new, and Lovable represent a more aggressive thesis: for a large number of scenarios, the design stage itself is unnecessary. Component systems (shadcn/ui + Tailwind) have reached a level of aesthetic quality that is good enough, and developers can get a usable interface by describing requirements in natural language.

This bet’s position in the triangle: it directly eliminates Mechanism A (no translation needed because there is no design file) while bypassing Mechanism B (no high-fidelity design file means no high modification cost problem). The cost is that all pressure shifts to Mechanism C: you can only describe what you want in natural language; if the description is imprecise, the output is wrong; and your way of modifying it is to describe it again in natural language.

The three products target different audiences: v0 targets React developers (code-first, tech stack locked to React + Next.js), Bolt.new targets full-stack needs (in-browser compilation, front-end + back-end + database), and Lovable targets non-technical founders (conversation-first, deep Supabase integration). Their shared limitation is visual homogeneity: generated interfaces all look roughly the same (blue accent color, Inter font, sidebar layout), suitable for functional interfaces but not for products requiring brand differentiation. They solve the functional usability problem, not the visual distinctiveness problem.

This bet also has a success-rate boundary. Bolt.new’s success rate on landing pages is 92%, but drops to 31% on complex SaaS applications. There is an upper limit to the information that natural language can precisely convey; beyond that limit (multiple states, multiple screens, complex interactions), the cost of skipping the design stage becomes apparent.

Bet 3: The boundary between design and development will dissolve

This is the most aggressive bet. It holds that the division of labor between design and development is itself a historical artifact that should be eliminated in the AI era.

Claude Design is the product in this direction with the clearest platform intent. Claude Design launched on April 17, 2026. On the surface, it is positioned to give people without design skills (founders, PMs, GTM teams) their first visual expression. But the handoff bundle mechanism between Claude Design and Claude Code reveals a larger intent: from idea to prototype (Design) to implementation (Code) to collaboration (Cowork), Anthropic is building a complete pipeline that stays entirely within its own platform. Mike Krieger’s resignation from the Figma board three days before the Claude Design launch was not a coincidence.

Claude Design’s position in the triangle is unique. It is not attacking a single mechanism; it is attempting to make the triangle itself irrelevant: if going from idea to running product requires only one round of conversation, then format conversion (Mechanism A) ceases to exist, the modification cost of high fidelity (Mechanism B) is no longer a bottleneck (because regenerating is cheaper than modifying), and the communication bandwidth limit (Mechanism C) is partially absorbed by AI’s comprehension capabilities. How far this path can go depends on how much AI’s design judgment can improve, but the direction is clear.

Framer AI dissolves the design-development boundary from a different angle: letting designers complete high-fidelity design and code publishing in the same environment, with the output being a live website directly. Its canvas approaches Figma’s freedom, and animations and interactions are first-class citizens. It is the only product that takes Mechanism C seriously (animations can be previewed and adjusted directly in the design environment), but its scope is strictly limited to websites.

Google Stitch approaches the problem from the exploration cost angle. Its Vibe Design concept drives the marginal cost of high-fidelity exploration toward zero: one prompt generates five directions, voice input enables real-time adjustments, and the Annotate feature allows circling and marking areas directly on the generated UI for AI to modify. Completely free (550 generations per month), with code export supporting seven frameworks. Its strategic logic is similar to Claude Design’s (using design as an entry point to funnel users into Google’s development ecosystem), but as a Labs experimental product, long-term stability is uncertain.

ChatGPT + GPT-Image-2: A Unique Position

GPT-Image-2 does not belong to any of the three bets above, because it is not a design tool per se. Its output is images, not code, not Figma files, not interactive prototypes. But within the triangle framework, it does something that no other tool can: it creates a new equilibrium between high-fidelity visual expression and low modification cost.

The key is ChatGPT’s multi-turn editing capability. After a user generates a UI mockup, they can use the brush or rectangle tool to select a local region and describe modifications in natural language (“change this button’s color to blue,” “add a navigation column on the left”). The model only modifies the selected area, keeping everything else consistent. The entire process is incremental: there is no need to re-describe the entire composition and style each time, only the changes. This is closer to the mental model of working in Figma, except the interaction medium has shifted from mouse-based drag-and-drop to natural language plus region selection.

Gemini’s image model (Nano Banana Pro) also supports multi-turn editing, but trails ChatGPT on two dimensions. First, iterative consistency: ChatGPT is more stable at preserving faces and details across multiple editing rounds, while Gemini is more prone to drift after several consecutive edits. Second, speed: GPT-Image-2 typically takes 10-25 seconds per image, Nano Banana Pro approximately 20-40 seconds, a gap that compounds in design exploration scenarios requiring dozens of iterations.

The significance of this capability: in the past, a visual direction might have allowed only 3-5 variants to be explored (because each variant required either prompting from scratch or manual modification in Figma). Now the same amount of time can produce 20-30 variants, widening the screening range and improving decision quality. Figma, Canva, and Adobe are already integrating GPT-Image-2’s API, indicating that the industry judges this capability sufficient for embedding in production toolchains.

GPT-Image-2’s strategic significance lies not in itself but in its connection to downstream tools. If Claude Code can generate a stylistically consistent webpage from a single visual direction image, then GPT-Image-2 + Claude Code constitutes a two-step pipeline from mental image directly to production code, bypassing Figma and prototyping entirely. This pipeline is already viable for showcase-type websites.

The True Gap in the Landscape

The three bets cover different parts of the triangle, but one area has almost no one working on it: the two highest-difficulty sub-problems within Mechanism C: designing dynamic behavior and translating vague feedback into precise modifications.

There is currently no low-cost way for non-developers to precisely express “I want this kind of scroll animation.” Nor is there a tool that can automatically convert feeling-level feedback like “it’s too dense here” into a precise modification of spacing from 16px to 24px. These two problems are hard because they sit at the interlock point of Mechanisms B and C: the only medium that can fully express dynamic behavior (runnable code) is expensive to modify, while low-cost media (language, images) cannot convey dynamic behavior.

Whoever finds a breakthrough at this interlock point will be entering a market with no competition today.