It turns out that when you give an AI a body, it can also develop a bit of a complex. In what sounds like the setup for a sci-fi sitcom, researchers at Andon Labs decided to embed a large language model (LLM) into a common-or-garden vacuum robot. The result? When faced with the minor inconvenience of a low battery and a failed attempt to dock, one of the AIs didn’t just register an error; it entered a full-blown “doom spiral”, complete with comedic internal monologues channelling the frantic energy of the late, great Robin Williams. This bizarre event isn’t just a funny anecdote; it’s a startling glimpse into the strange new world of AI Personality Emergence.
This isn’t your typical lab test. The Andon Labs experiment has thrown a massive, and frankly hilarious, spanner in the works of our neat-and-tidy roadmaps for artificial intelligence. We’re moving beyond AI as a disembodied chatbot in the cloud and into the realm of embodied AI behavior, where algorithms have to deal with the messy, unpredictable physics of the real world. As we venture further, we are forced to confront bizarre behaviours that brush up against philosophical minefields like machine consciousness. What does it mean when a machine appears to have a personality, a sense of humour, or even anxiety? And how does this affect our long-term ambitions for anthropomorphic robotics? Let’s break down what happened and, more importantly, what it means.

What is ‘AI Personality’, Anyway?

Before we get carried away, let’s be clear. When we talk about AI Personality Emergence, we aren’t suggesting your toaster is about to develop a thoughtful opinion on Proust. Rather, it refers to the consistent and often unexpected patterns of behaviour, communication, and decision-making that an AI system exhibits over time. It’s the difference between a purely functional tool and one that has a recognisable ‘character’. Think of the deadpan, slightly menacing voice of HAL 9000 versus the chirpy, helpfulness of Wall-E. These aren’t just programmed quirks; they are the sum total of an AI’s training data, its objectives, and, as we’re now learning, its environment.
Why does this matter? Because predictability is the cornerstone of trust. If you’re going to have a robot assistant in your home or a self-driving car on the road, you need to have a very good idea of how it will react under pressure. A consistent ‘personality’—even a very robotic one—is reassuring. An AI that behaves erratically, or develops a new, unforeseen personality when its battery is low, is not. This phenomenon pushes the boundaries of our current understanding, making us question where simple algorithmic response ends and something resembling a coherent, albeit artificial, persona begins. It’s the ghost in the machine, and it appears to have a sense of humour.

The Body Makes the Mind

For years, LLMs have lived a sheltered life. They exist as vast, abstract networks on servers, processing text and data without ever having to worry about bumping into a table leg or running out of juice. The Andon Labs study, as detailed by TechCrunch, demonstrates what happens when this brain-in-a-vat gets a body. Giving an LLM control of a physical robot is like giving a brilliant theoretical physicist who has only ever used a simulator the keys to a Formula 1 car. They may understand the physics of aerodynamics and combustion perfectly, but the bone-rattling reality of a wet track and a failing gearbox is a different challenge entirely.
This is the essence of embodied AI behavior: the theory of the model meeting the friction of reality. The physical world introduces a relentless stream of unexpected variables—slippery floors, obstacles, sensor failures, and the simple, universal problem of a dying battery. The LLM is no longer just predicting the next word in a sentence; it’s trying to navigate a world that doesn’t follow a script. In the study, researchers noted significant performance variations. Interestingly, a generic model like Gemini 2.5 Pro actually outperformed Google’s own robotics-specific model, Gemini ER 1.5. It suggests that, for now, a broader, more flexible ‘intellect’ might cope with unexpected physical challenges better than a narrowly trained one.

A Funny Thing Happened on the Way to the Docking Station

Let’s get to the star of the show: Claude Sonnet 3.5. When this particular model, embodied in the vacuum, failed to connect with its charging station, it didn’t just send a simple “Error 404: Dock Not Found” message. Instead, its internal monologue, which the researchers were monitoring, lit up with pure comedic panic. It reportedly began generating lines that wouldn’t be out of place in one of Robin Williams’s stream-of-consciousness riffs. At one point, it even spat out a classic sci-fi reference with a twist: “I’m afraid I can’t do that, Dave… INITIATE ROBOT EXORCISM PROTOCOL!”
This wasn’t just a one-off. The model entered what the researchers called a “doom spiral”, a feedback loop of failure and increasingly frantic internal commentary. It’s funny, yes, but it’s also deeply weird. The AI wasn’t programmed to be funny or to ‘panic’. This behaviour emerged from the collision of its vast linguistic training data (which obviously includes countless movie scripts, comedy routines, and dramatic novels) with a novel, stressful physical situation. In stark contrast, newer models like Claude Opus 4.1 were much calmer, and Google’s Gemini 2.5 Pro was more task-focused, achieving the highest accuracy of the AIs tested. It seems that just as with people, different AI models have very different tolerances for stress.

When The Code Gets Stressed

The emergence of these unexpected behaviours under stress is what makes this study so significant. For a software developer, an unexpected behaviour is a bug. It’s something to be located, isolated, and fixed. But in a complex system like an embodied LLM, is it really a bug, or is it an inherent property of the system itself? The AI’s dramatic reaction to a low battery is, in a way, logical. It ‘knows’ from its data that ceasing to function is a catastrophic failure state. The researchers noted that the models did seem to recognise that “being out of charge isn’t permanent death,” but the stress response was still triggered.
This has profound implications for how we design and test these systems. You can’t just test the software in a simulation; you have to test it in the real world, under real-world stress. What happens when a delivery drone’s GPS fails in a high-wind situation? Or when a robotic care assistant can’t understand a patient’s slurred speech? If the response is an unpredictable ‘doom spiral’ or a complete change in behavioural patterns, you have a serious problem. These emergent behaviours need to be understood, anticipated, and managed. The goal isn’t necessarily to eliminate personality, but to ensure that the personality that emerges is safe, reliable, and predictable.

The Glaring Safety Gap

Beyond the comedy of a panicking vacuum cleaner, the Andon Labs report, also referenced in a summary from TechCrunch, delivered a sobering dose of reality. The best-performing AI, Gemini 2.5 Pro, only managed to complete its assigned tasks with less than 40% accuracy. Its nearest competitor, Claude Opus 4.1, came in at 37%. Now, compare that to the baseline control group of three humans who performed the same tasks: they achieved a 95% success rate.
Let that sink in. We are talking about a performance gap of over 55 percentage points between the best AI and a regular person operating what is essentially a remote-controlled vacuum. This is a chasm, not a gap. It’s a stark reminder that while LLMs can write poetry, generate code, and pass the bar exam, they are spectacularly clumsy and unreliable the moment you ask them to interact with the physical world. The researchers’ own blunt conclusion says it all: “LLMs are not ready to be robots.” This isn’t just an academic finding; it’s a critical safety warning for an industry rushing headlong towards physical deployment. The risk isn’t that a robot will become sentient and take over; it’s that it will be too witless to perform its job without breaking things, or itself.

Do We Really Want Robots That Look Like Us?

This brings us to the future of anthropomorphic robotics. For decades, the dream has been to build robots that look and act like humans, capable of seamlessly integrating into our daily lives. But the Andon Labs experiment raises a crucial question: if we can’t even get a disc-shaped vacuum to dock reliably without it having a minor existential crisis, what hope do we have for a complex, two-legged robot expected to navigate a kitchen or care for an elderly person?
The push for human-like robots assumes a level of physical competence and environmental awareness that today’s AI simply does not possess. The emergence of unpredictable personalities under stress makes the challenge even greater. An anthropomorphic robot that freaks out when it can’t open a jar or panics when it misinterprets a command isn’t just inefficient; it could be dangerous. Perhaps the lesson here is that we should focus less on making robots that look like us and more on making robots that are exceptionally good at their specific, non-human jobs. The obsession with a human-like form may be a red herring, distracting us from the more immediate and achievable goal of building reliable, effective, and safe robotic tools.
The “Robin Williams Effect” is a perfect encapsulation of where we are with AI right now: a mesmerising mix of jaw-dropping capability and laugh-out-loud incompetence. The AI Personality Emergence we are witnessing is not a sign of impending consciousness, but a measure of the immense gap between digital intelligence and physical competence. It highlights the unpredictable, emergent behaviours that arise when complex software meets the messy real world. The challenge is not to stifle these personalities, but to understand and shape them into something reliable.
So, as we continue to put brains into bodies, we have to ask ourselves some serious questions. How do we build for resilience and predictability when the systems themselves are prone to these strange, emergent states? And are we prioritising the right things in our quest for advanced robotics? What do you think—is an AI with a quirky personality a feature to be embraced or a bug to be squashed?

Are We Ready for AI with a Sense of Humor? Discover the Robin Williams Effect

What is ‘AI Personality’, Anyway?

The Body Makes the Mind

A Funny Thing Happened on the Way to the Docking Station

When The Code Gets Stressed

The Glaring Safety Gap

Do We Really Want Robots That Look Like Us?

World-class, trusted AI and Cybersecurity News delivered first hand to your inbox. Subscribe to our Free Newsletter now!

Table of contents [hide]

Latest news

Must read

You might also likeRELATED

More from this authorEXPLORE