AI AgentAI Products & PlatformsDeveloper Tools

Making Creative Tools Agent-Native: From Photoshop Actions to Claude for Creative Work

2026-04-28

What Happened Today

On April 28, 2026, Anthropic released Claude for Creative Work, a product suite comprising 9 creative tool Connectors that allow Claude to directly control professional creative software including Adobe Creative Cloud, Blender, Autodesk Fusion, SketchUp, and Ableton. Users issue natural language instructions in the Claude interface, and Claude executes operations through the Connectors by calling the software’s backend APIs.

Take Blender as an example. Blender’s development team built an official MCP connector based on Blender’s Python API, enabling Claude to query scene data and object information, create and modify 3D objects, batch-apply changes, run custom Python scripts, and add new tools to the Blender interface. According to The Verge, Anthropic also joined the Blender Development Fund as a Corporate Patron, contributing at least 240,000 euros annually to support continued development.

Here’s what the actual user experience looks like. A demo on Hacker News showed a user typing “create a small village, five huts around a bonfire, a river on the left with a wooden bridge over it, trees scattered around,” and Claude generating a multi-object scene with correct spatial relationships. It can understand spatial constraints like “the bridge needs to span the river,” set up sunset lighting and orbiting camera animations, and respond to iterative modifications (“change the huts to stone houses”). All geometry is generated locally in the user’s Blender instance; only lightweight scene description data is sent to Claude’s cloud.

This didn’t come out of nowhere. Eleven days earlier (April 17), Anthropic released Claude Design, an AI design workspace aimed at non-designers, which caused Figma’s stock to drop roughly 7% on the same day. Claude Design targets people who can’t design (going from text to visual output), while today’s Connectors target professionals already proficient with their tools (embedding AI assistance into existing workflows). Together, the two products represent Anthropic’s comprehensive move into the creative domain.

But this isn’t Anthropic’s invention. The community has been walking the path of making creative tools agent-native for over two years. What Anthropic did is consolidate scattered community experiments into a productized experience, using brand relationships and partnerships to push it to a wider audience.

Three Generations of Evolution

Generation 1: LLMs Augmenting Existing Script Interfaces (2023-2024)

Creative software had mature programmable interfaces long before the AI era. Blender has exposed a complete Python API (the bpy module) since 2009 (version 2.5). Photoshop has Actions and JSX/UXP, Maya has MEL/Python, Houdini has VEX, After Effects has ExtendScript. These interfaces originally served Technical Directors and Pipeline TDs, requiring simultaneous mastery of creative tools and programming. The barrier was high enough that Springer published a 486-page book dedicated to the topic in early 2025.

After GPT-4 and Claude were released, these existing script interfaces suddenly gained a new entry point: having LLMs write the scripts. The barrier dropped from “knows Python and bpy” to “can describe what they want.” The earliest form was manual copy-paste (ChatGPT generates a script, user copies it into Blender to run), and one could also use Blender’s built-in command-line arguments to execute scripts without entering the GUI. Later, people built AI assistant plugins embedded in Blender (e.g., Blender AI Assistant), enabling direct conversation and execution within the interface. Two years ago I did something similar myself, using Cursor to call Blender’s Python API for generating 3D demo videos.

This generation proved that LLMs can effectively translate natural language into creative software API calls. But it had a fundamental limitation: the LLM handles code generation, not verification. A script might run (no errors), but the rendered result could be completely wrong. A developer at Atomic Object tested this and found that AI works for geometry operations, but results become unpredictable for tasks involving materials, lighting, and other aspects requiring visual judgment.

This is precisely the core problem we raised in our analysis of the Agentic AI deployment crisis: AI’s self-iteration loop is broken. Code running without errors doesn’t mean the result is correct. If AI can’t see the rendered output, it can’t judge quality, and therefore can’t self-correct.

Generation 2: The Connection Protocol Layer (2025-Early 2026)

The bottleneck of Generation 1 wasn’t the LLM’s code generation capability, but the primitive communication method (generate script, manually paste, run, manually inspect, tell LLM the result). Generation 2 uses standardized protocols for direct bidirectional LLM-to-software communication, eliminating human intermediation.

Several landmark events define this generation. Anthropic released MCP (Model Context Protocol), defining a standard interface for AI to call external tools. In early 2025, Siddharth Ahuja released the community BlenderMCP (currently v1.5.5), enabling direct bidirectional communication between Claude and Blender via sockets. 3D-Agent built on this with a multi-agent architecture: one model handles high-level reasoning, another translates into bpy code, significantly improving success rates.

Later in early 2026, the HKUDS team at the University of Hong Kong released CLI-Anything, which automatically analyzes GUI software source code and uses a 7-phase pipeline to generate agent-native CLI interfaces, now covering dozens of applications. Our previous survey of it highlighted a key insight called “The Rendering Gap”: the agent modifies a project file, but when rendering takes a simplified path, it may not see the modification’s effect. The agent thinks the operation succeeded when nothing actually happened.

This generation solved the connection problem. Agents can now send commands directly. But the feedback loop remained broken; agents still couldn’t see the visual effects of their operations.

The protocol layer itself is also in rapid flux. MCP, starting from a research-oriented design, has repeatedly hit walls in engineering reality. OpenAI used the _meta field to bypass context window limitations. Feishu and DingTalk in March 2026 chose to publish CLIs over MCP as their preferred agent integration path. Competition in the protocol layer is far from settled.

Generation 3: Closed-Loop Agents (2026-)

The defining characteristic of Generation 3 is closing the feedback loop: agents don’t just operate software, they also perceive the results of their operations and self-correct accordingly.

This isn’t something Anthropic invented today. Multiple community projects have been building this. I built an Unreal Engine bridge based on Mengzhou’s UE MCP, and its core design is the feedback tool: after an AI agent issues an operation command, it requests a viewport screenshot, sees the visual result, then decides the next step. This “operate, screenshot, evaluate, operate again” cycle is the concrete implementation of a closed feedback loop. 3D-Agent discussed a similar approach in the Blender Developer Forum: adding a visual feedback channel for agents.

What Anthropic released today is the productized version of this generation. Compared to community projects, its advantage lies in integration (Connector platform already in place, Claude’s multimodal capabilities mature, Blender team officially involved in building), but the underlying approach is identical: bidirectional communication where the agent doesn’t just send commands but also reads back state and results.

Practical challenges in building closed loops, as summarized by the 3D-Agent author: separating reasoning and code generation into different models works better; RAG over bpy documentation solves only about 50% of problems; math-intensive operations remain a weakness. Members of the Blender Artists community noted that these tools are more valuable during learning and exploration phases (explaining errors, analyzing scenes) than for directly generating final output.

A Judgment Framework

The three generations above can be compressed into a judgment framework. Making creative tools agent-native requires three components to be in place, in sequence:

Component 1: Programmable Interface. The software backend can be programmatically controlled. Blender’s bpy, Photoshop’s UXP, Figma’s Plugin API. Without this layer, AI has no entry point for manipulation. Creative software, due to historical pipeline automation needs, had this component in place earlier and more completely than most domains.

Component 2: Natural Language Translation and Bidirectional Connection. LLMs translate natural language into API calls, while a protocol layer enables direct AI-to-software communication. MCP, CLI-Anything, and various socket bridges all serve this layer.

Component 3: Perception and Evaluation Closed Loop. The agent sees operation results, judges whether they meet expectations, and corrects if they don’t. This requires multimodal perception (seeing rendered images), quality evaluation criteria (knowing what “good” means), and reasoning capability (translating evaluation results into corrective actions).

When encountering any new creative AI announcement, using this framework to identify which component it fills and which it still lacks allows you to quickly position it on the evolutionary timeline.

So which of these three components is the actual bottleneck today? Component 1 has long been in place for creative tools. Component 2 has been largely solved over the past year-plus by MCP and various community projects. What’s truly blocking the entire field is Component 3: the perception and evaluation closed loop.

There’s an easily confused judgment here. Many assume the bottleneck is the model’s “design capability” or “spatial understanding,” believing models need to be smarter for creative tasks. The reality is that current mainstream models (Claude, GPT-5.5, Gemini) already have sufficient foundational capabilities in spatial understanding, code generation, and image comprehension. They can write correct bpy scripts, understand spatial relationships in scenes, and generate reasonable layouts from descriptions. The bottleneck isn’t “can AI understand what you want” but “can AI see what it produced and judge whether it’s correct.”

This also means Anthropic has no insurmountable moat here. If OpenAI wanted to do the same thing, all the underlying components are available: GPT-5.5’s multimodal capabilities can view rendered images, MCP is an open protocol anyone can use (OpenAI is already supporting MCP), and Blender’s Python API is open to everyone. Feishu and DingTalk choosing the CLI path already demonstrated that the connection layer isn’t an exclusive resource. The key components have matured to the point where any major player can assemble a similar product.

What makes Anthropic’s release today meaningful isn’t that it did something no one else could, but that it consolidated scattered community explorations into a consistent, brand-endorsed product experience, using commercial partnerships (Blender Patron) and product integration (Connector platform) to reduce user friction. This is a lead in execution and productization, not in technical capability.

Practical Boundaries and Human Roles

Based on community usage reports, current tools (whether Anthropic’s Connectors or the community’s BlenderMCP/3D-Agent) have clear capability boundaries.

Scenarios where they work well: procedural modeling, scene layout, spatial relationship handling, format conversion and export, analyzing and debugging existing scenes, explaining error messages. These tasks have clear success criteria and don’t depend on subjective aesthetics.

Scenarios where they fall short: original concept design, aesthetic adjustment of materials and lighting, complex character animation, narrative composition, brand style consistency.

For most creative professionals, the most practical current use isn’t “letting AI do your creative work” but “letting AI handle the non-creative parts of creative work.” A large portion of scene layout time goes to repetitive physical arrangement. Rendering pipeline work is mainly parameter configuration and queue management. Asset management is format conversion and naming conventions. Automating these processes doesn’t require AI to have aesthetic capabilities; it only needs to correctly execute instructions and report back the results.

Looking at the pattern across three generations, each one eliminated friction at the execution layer. But none eliminated two things: deciding what to make (creative direction) and judging whether what was made is good enough (aesthetic taste). These are fundamentally subjective, experience-dependent, and bound to cultural context. Within foreseeable technical trajectories, they won’t be replaced by automation.

Once execution friction is eliminated, the source of competitive advantage shifts. The advantage of “who can execute faster” weakens, while “whose ideas are better” and “whose judgment is more accurate” become more important. For creative professionals: the value of tool mastery will decline; the value of taste and directional sense will rise.

Sources