Where's the creativity in creative AI tools?

When talking to artists, two words repeatedly come up: “process” and “control.” They see art as the outcome of an iterative process, wherein they exercise control of their medium to express what they perceive and feel. Mastering the medium is just as crucial as producing masterpieces.

AI engineers, however, adopt a different vocabulary. Their favored terms are “automation” and “generation.” For them, LLMs are instruments that take input text and return art, almost instantaneously. The process doesn’t matter. What matters most is the apparent quality of the output and the speed at which it materializes.

For this reason, most generative media tools operate similarly. They accept a text prompt and output a finished product. When you use it for the first time, it may feel like magic. When you continue to use it, the magic fades.

To keep that magic alive, creative tools must correspond with the creative process itself - extending the artist’s imagination, automating only what is routine, enabling exploration at the boundaries of the medium, and heightening expression without forcing the artist to surrender control.

‍

Creative AI is our photograph moment

This is far from the first time a new medium challenged artists’ creative process.

One thing that many people know about me is that I really like looking at paintings, and I like reading about paintings even more. I minored in art history, focusing on Abstract Expressionism and Russian Constructivism. Both of these movements can be perceived as a reaction to new technologies. Ever since, I’ve been preoccupied with the relation between art and its means, including how technology provokes artists and how it extends or constrains the possibilities of expression.

In the 19th century, photography emerged, promising truth: a machine-made image that could capture the world with uncanny accuracy. From the crisp daguerreotypes of 1839 to the gritty social documentaries of the 1930s, photography quickly positioned itself as both witness and record. By the time World War II concluded, photographs had become the dominant way people encountered the world, through newspapers, magazines, and archives that placed the camera’s eye at the center of modern experience.

In New York, however, a group of painters began searching for something that photography could not offer. Abstract Expressionism can be viewed as a reaction to the rise of photography. Painting was no longer the best medium to capture reality, so artists needed to rethink their medium (the canvas) and its affordances. They believed that pictorial art should no longer pretend to represent the visible world; it should emphasize its own materials (and their limitations): the flat surface, color, and paint.

Yet Abstract Expressionists did not, in fact, reject photography. Many used photographs as visual references to experiment with composition, textures, or forms divorced from recognizable subject matter; to isolate lines, shapes, or details and further abstract them when painting. Photographs became part of their iterative process.

“A painting is not a picture of an experience, it is the experience” - Mark Rothko

The Rothko Room at The Phillips Collection

In the early decades of the twentieth century, Russia was swept up in a wave of technological transformation. Rapid mechanization brought steel girders, reinforced concrete, and expansive glass panes into architecture, while the spread of electric power lit up cities and factories alike. High-speed rotary presses made it possible to churn out posters, journals, and newspapers at unprecedented scales, just as telegraph lines and radio broadcasts shrank distances by transmitting information instantly across vast territories. These innovations didn’t just modernize daily life; they cultivated a cultural fascination with speed, connectivity, and the sheer power of industrial production.

In this context, the Russian Constructivists embraced technical innovation with unusual directness. They adopted materials (e.g., metal, glass, concrete, wood) and methods (e.g., the principles of construction, assembly, and functionality) found in industrial production. They focused upon geometric abstraction and clear, structural designs that mimicked the logic and orderliness of technological processes.

The Abstract Expressionists and Russian Constructivists had very different reactions to technological change. Still, both movements reacted strongly to it, dramatically changing not only their artistic output but also how they made art - their artistic process.

Today, new AI models have arrived that can generate photos, videos, and music faster and at a larger scale than any human ever could. But most companies developing these technologies just focus on training models that produce better artistic outputs faster. They’re not letting artists react; either by using AI as a tool to create art that embraces the limitations of its medium, or by leveraging AI to create art that still requires human intervention, yet is fundamentally different from what they made before. So many of these companies have forgotten to ask what makes the creative process meaningful to humans.

Most GenAI tools today do not fit the creative process

I’ve observed a consistent pattern with creative AI tools. A new generative AI tool launches with a simple workflow: type a text prompt, wait a minute, and receive an image, song, or video. There’s an immediate dopamine hit. You imagine something, like “a cactus drinking a margarita,” and suddenly it’s right there on your screen. The tools see explosive early growth by democratizing creation for mass audiences and producing remarkably high-quality outputs.

But then the spell begins to break. The initial awe gives way to a quieter disenchantment. Tools like Midjourney or Suno churn out images and music with striking technical polish, but making art starts to feel hollow when it all happens through text boxes. A musician might generate a track and think, "The drums are great, but the vibe is too pop. I want it to lean more classic rock.” With existing tools, there’s no way to refine the work - just the dull ritual of tweaking the prompt and hoping the good parts survive

This is not how creativity works. Real creative processes are non-linear; they move through revision and reversal. You might start with a melody, change the key, realize the bassline now feels off, change it, and then circle back to reshape the original melody. The process is dialectical; a dialogue between the creator and the medium - not a one-off transaction. Today’s creative AI tools flatten that process. They’re the equivalent of a band discarding an entire song because the first run-through wasn’t perfect, instead of working through the iterations and adjustments that make art come alive.

In addition, creatives often find inspiration by exploring and pushing against a medium’s limitations, yet today’s AI tools discourage this kind of engagement. Rather than inviting artists to react, refine, and reinterpret, they simply prompt them to reset and start over. In the aughts, artists like Mario Klingemann embraced the hallucinations of GANs, praising their ability to “generate surprise and serendipity; […] whilst you can train them towards performing a certain trick, they still seem to have their own will and do not always follow your instructions by the letter.” That unpredictability and sense of resistance sparked new artistic directions instead of being treated as flaws to eliminate.

From Mario Klingemann’s Neural Glitch Series (2018)

More thematically, the problem isn’t the machine’s incompetence; it’s its indifference. While genAI tools gesture at authorship, they strip away the sustained act of making.

In contrast, artists know that creation is not just about the finished product - it’s about the hand that shapes it, the effort invested, and the translation of private feeling into form.

Technical challenges with building control

There’s a reason most creative AI tools today don’t offer creative control. The challenge of embedding creative control into generative AI tools isn't just philosophical; it's deeply technical. Two main hurdles make creative control hard to implement.

The first is fine-grained control of individual elements. Say you generate an image of a house with Adirondack chairs and want to swap them for rocking chairs while leaving everything else untouched. That’s surprisingly non-trivial. Most diffusion models don’t natively offer precise spatial understanding or semantic decomposition without extra architectural or conditioning mechanisms. They don’t naturally segment scenes into discrete, editable objects.

A significant reason is the data itself. We often lack the right datasets to support this kind of surgical editing. Most training data consists primarily of full images with holistic captions, not granular object-level annotations that could help models map pixels to semantic elements. While some segmentation datasets exist, they’re much smaller and not the primary driver of large diffusion model training. Additionally, the latent representations learned by these models are often entangled, so changing one aspect (like chair type) usually affects neighboring regions or global image properties.

The second challenge is enabling genuine iteration. Most generative AI systems are built around regeneration, where each change requires starting over with a new prompt. They don’t support the kind of step-by-step editing that defines real creative work. Traditional tools let you build incrementally - you tweak a layer, adjust a parameter, undo, redo. By contrast, today’s models typically operate in compressed latent spaces not designed for iterative manipulation. They can generate compelling outputs, but don’t preserve the intermediate structure needed to revisit and refine decisions.

The result is that iteration, arguably the core of the creative process, is still clumsy and indirect. Unlike artists, who usually compose part by part (e.g., painting a subject before adding a background, writing a melody before layering harmony), AI tools rarely allow that gradual construction. Getting them to configure combinations of concepts is difficult, and too often the system ends up blending distinct elements together.

What makes this even more complex is that people don't want complete control—they want some automation, and the chance to be surprised. This creates a fundamental tension in model architecture: how can we achieve diverse generation while maintaining controllability? Technically, diversity and control often work against each other. High-entropy sampling methods that produce creative variation can destabilize fine-grained control. Deterministic guidance techniques that enforce constraints may collapse the output distribution toward predictable results. The best creative partnerships happen when AI can inspire new directions while still allowing creators to steer the process according to their vision, but this requires models that can dynamically balance the exploitation of learned patterns with the exploration of novel combinations. Today, most architectures struggle to provide this capability.

This tension makes for a fascinating design challenge. Those building creative tools must figure out how to reconcile automation, surprise, and creative control. Too much automation severs creators from their work, reducing creation to consumption. Too little, and you lose the efficiency gains that make AI valuable. Too much predictability, and you stifle accidents: the serendipitous discoveries that often propel art forward. Too little predictability, and you make tools impossible to use.

Many developers of AI tools overlook this. They’re so fixated on outputs that they assume if models just get more accurate, the need for human intervention will vanish. But this is a profound misreading of the creative impulse. Even if you had the so-called “perfect’ output, the instinct would still be to polish it, to leave a trace of yourself in the work. The desire for control is deeply human; irrational, maybe, but inescapable. It’s the ground upon which art is built.

Lessons from our portfolio

When we led Runway’s seed in 2019, the feedback from users was strikingly consistent. They wanted some automation and their tools to feel delightful to use. They didn’t want to wrestle with infrastructure. But above all, they demanded control. The quality of output mattered less. Many were even charmed by the strange hallucinations of the GANs Runway employed back then. They cared about the ability to guide and refine the process, to feel like they were shaping the work rather than simply receiving it.

A year later, in 2020, I spoke with Bram Adams, a creative technologist and early Runway adopter, about AI tools. He compared the experience to making pizza: you don’t want a frozen pizza, but don’t want to build the oven from scratch. The real satisfaction comes from working the dough, choosing the toppings, exercising judgement, and taste. Adams understood the same thing Runway users expressed: creation is sustained not by output alone but by the interplay of effort and control; of automation and authorship.

More recently, I asked Cris, Runway’s CEO, what had surprised him most. He told me that some people don’t use Runway to create work for others; they use it simply to play. That distinction is telling. Typing a prompt, getting an output, and repeating the cycle is not creation; it is consumption. What makes creating art engaging is the ability to shape things as you go, try out ideas, and balance automation with surprise, so the process feels like yours.

An example of creative control in Runway is the ability to vary an image output instead of just rerunning the prompt and hoping.

I was reminded of this in a recent investment in a company developing generative music tools for professionals. I asked the founder what he believed the creators of consumer-focused music applications often failed to grasp. He answered without hesitation: “Music creation should remain frictionful and tactile.” Like the founders of Runway, he understood that when creative effort is removed, value is removed. Art improves when the artist must push against the tool and when the process demands judgment, persistence, and taste.

Unlocking new forms of creative expression

The next generation of creative AI tools will not be measured by the surface quality of their outputs but by the kinds of creative practice they make possible. Achieving this will require rethinking fundamentals: how data is collected and synthesized, how models are architected, and how artists interface with models and their tools. The companies that succeed will not simply deliver state-of-the-art models and more efficient tools; they will open space for new modes of expression.

Cultural shifts have always followed from changes in form. Photography forced painting to move beyond faithful representation. Collage went further, breaking the picture plane into fragments and showing that art could be built from juxtapositions rather than a single, unified perspective. In the same way, the most compelling creative work of the coming decade will likely emerge from the tension between human creativity and AI capability - when artists don’t just use AI, they compose with it.

Authors

Sarah Catanzaro

Editors

Justin Gage

Acknowledgments

Thanks to Sander Dieleman, Pedro Tsividis, Minqi Jiang, and Cristóbal Valenzuela for reviewing earlier drafts of this post.