This isn’t just about a technical quirk; it gets to the very heart of what “creativity” means for a machine and what the strategic implications are for everyone, from artists to the companies building these models.
So, What Are AI Image Generators, Really?
Before we get into the crux of the issue, let’s quickly align on what we are talking about. At their core, these tools are complex neural networks trained on vast datasets of images and their corresponding text descriptions, scraped from the internet. When you give a model like Stable Diffusion or Midjourney a prompt, it isn’t “understanding” your words in a human sense. Instead, it’s using mathematical associations to find patterns in its data that match your description and then generating a new image that fits those statistical patterns.
Think of it like a chef who has tasted millions of dishes but has never been given a single recipe. If you ask for a “celebratory cake,” they won’t invent a new flavour profile. They will synthesise the most common elements of all the celebratory cakes they’ve ever “tasted”—probably something with chocolate, vanilla, and sprinkles. It will be plausible, but it will be an amalgamation of the past, not a true invention. This is a critical distinction for understanding their limits.
The ‘Visual Telephone’ Experiment
This brings us to a fascinating piece of research published in the journal Patterns00222-1). Scientists designed an experiment that mimics the children’s game of ‘Telephone’ (or ‘Chinese Whispers’ as it’s known here in the UK). They started with an initial image prompt, fed it to an AI image generator (Stable Diffusion XL), and then had a second AI (LLaVA, a vision-language model) describe the resulting image. That new description was then used to generate the next image, and this cycle was repeated hundreds of times.
The idea was to see where the AI’s “imagination” would wander when left to its own devices. If human artists played this game, you’d expect wild, unpredictable variations based on individual interpretation and creative flair.
The Predictable Outcome
What the researchers found, as reported by Gizmodo, was the complete opposite of creative divergence. After about 100 rounds, regardless of the starting prompt, the images consistently converged into just 12 dominant visual styles. These weren’t exotic or imaginative; they were clichés. Think rustic lighthouses by the sea, dogs with pleading eyes, formal interior shots, and quaint, rural buildings.
The study’s authors aptly described the output as ‘visual elevator music’—bland, generically pleasing, and utterly unoriginal. This experiment powerfully demonstrates the emergence of generative art patterns not born from artistic choice, but from the statistical gravity of the model’s training data. The AI wasn’t exploring; it was falling back to its safest, most statistically probable aesthetic defaults.
The Inherent Limitations of AI Generators
This convergence isn’t a bug; it’s a feature of how these systems are designed. It shines a light on two fundamental issues: creative constraints and algorithmic bias.
Creative AI Constraints
The core issue is that these models are inherently derivative. They can only re-combine what they’ve already seen. Whilst the number of potential combinations is astronomically large, the machine has no genuine intent or understanding of aesthetics. It simply follows the path of least resistance towards the most common visual tropes in its dataset. This is one of the most significant creative AI constraints we face. The system is incentivised to produce something plausible that matches the prompt, not something genuinely novel that challenges convention. Novelty is, by definition, statistically unlikely.
Algorithmic Bias in Art
This leads directly to the problem of algorithmic bias in art. The AI doesn’t just learn patterns; it learns the biases embedded within its massive, human-curated training data. If the internet is flooded with more pictures of idyllic lighthouses than, say, brutalist architecture in suburban settings, the model will naturally default to the former.
This results in a creeping visual style homogenization. As more artists and designers use these tools, there is a real danger that our visual culture could become dominated by these 12-odd “default” styles. It’s a feedback loop: the AI produces generic images, which are then used in more web content, which could eventually be scraped to train the next generation of AI models, further reinforcing the same tired aesthetics. The diversity of visual expression, a hallmark of human culture, is at risk of being sanded down into a smooth, predictable paste.
What This Means for Artists and the Industry
So, does this mean AI art is a dead end? Not at all. But it does reframe the role of the technology. These AI image generator limitations suggest the tools are best used not as autonomous creators, but as assistants or brainstorming partners.
For an artist, an AI can be an incredibly powerful tool for rapid iteration. You can generate a dozen variations on a theme in minutes, helping you refine a concept. But the study shows that the spark of true originality—the injection of a weird idea, the deliberate breaking of a pattern, the unique personal history that informs an artistic choice—must still come from the human. The artist is no longer just the prompter; they are the curator, the editor, and the crucial source of variance that prevents the output from spiralling into cliché.
The balance is not between human versus machine, but between human creativity and AI-powered execution. The person who can master the tool whilst bringing their own distinct vision to the table will be the one who produces truly compelling work.
Looking ahead, the challenge for companies like Stability AI, OpenAI, and Midjourney is clear. How do you build models that can escape this aesthetic gravity? Perhaps it involves more sophisticated training techniques, or architectures that explicitly reward novelty. Or maybe it’s about creating tools that give artists more direct control to “push” the model out of its comfort zone.
What do you think? Is this homogenisation inevitable, or is it a temporary hurdle in the development of AI? How can artists best use these tools without falling into the trap of producing ‘visual elevator music’?


