For all the chatter about large language models and their poetic, if occasionally unhinged, prose, the next real frontier for artificial intelligence isn’t about writing better sonnets. It’s about seeing, understanding, and predicting the physical world. This is the domain of visual world models, and Meta, despite its recent internal chaos, is making a very loud, very public bet that this is a race it can still win.
The company is reportedly working on two new flagship models, codenamed ‘Mango’ and ‘Avocado’, with a planned debut in 2026. This isn’t just another incremental update; it’s a fundamental strategic pivot towards AI that can truly perceive reality. But with rivals like Google and OpenAI seemingly miles ahead and a brain drain hollowing out its top labs, is Meta trying to build a new world or just desperately trying to stay on the map?

So, What on Earth is a Visual World Model?

Before we dive into Meta’s C-suite drama, let’s get our heads around the technology itself. For years, AI models have been fed a diet of text and static images. They can describe a picture of a cat, but they don’t understand the “catness” of a cat – its likely movements, its behaviour, the physics of its jump. They see snapshots, not the movie.
Visual world models change the game. Think of it like this: teaching a child about a ball by only showing them photos is one thing. Giving them a ball to see it roll, bounce, and react to a push is something else entirely. The child builds an intuitive model of physics and cause-and-effect. That, in essence, is what these models aim to do. They learn the underlying “rules” of the visual world by watching vast amounts of video, enabling them to simulate and predict what happens next.
This is where action prediction architectures come into play. These are the engines that allow the model to not just observe a sequence but to make an educated guess about the future. If a model sees a glass teetering on the edge of a table, this architecture is what allows it to predict the most likely outcome is a smash on the floor, not it suddenly flying to the ceiling. This predictive power is the bedrock of genuine reasoning and a huge step towards embodied AI training, where a robot can learn to navigate a new kitchen without having to be explicitly programmed for every single object and scenario it might encounter.

Meta’s Two-Pronged Attack: ‘Mango’ and ‘Avocado’

According to reports from a recent internal Q&A covered by TechCrunch, Meta’s answer to this challenge comes in two flavours. ‘Avocado’ is a text-based model, likely a successor to Llama, but with a sharpened focus on coding. Meanwhile, ‘Mango’ is the big one: an image and video model designed to pioneer this new visual understanding.

The Vision from the Top

The project is under the new Meta Superintelligence Labs (MSL), led by Alexandr Wang. During the internal meeting, Wang articulated the vision clearly: Meta aims to make the text model “better at coding while also exploring new world models that understand visual information and can reason, plan, and act without needing to be trained on every possibility.”
This is the holy grail. An AI that doesn’t just regurgitate its training data but can generalise from it to solve novel problems. This is precisely what separates today’s clever chatbots from a truly intelligent agent. By developing these as multimodal foundation models – systems that fluidly integrate text, images, and video – Meta is hoping to build a platform that understands context, not just content.

A House in Disarray?

This is all very ambitious. It sounds fantastic on a press release. But the reality inside Meta’s AI division appears to be far more turbulent. The creation of MSL itself was the result of a significant corporate restructuring in 2025, a move that ruffled more than a few feathers.
The company has been haemorrhaging top AI talent. Several researchers recruited for the superintelligence lab have already reportedly quit. Most notably, AI luminary and Turing Award winner Yann LeCun, a long-time Meta stalwart, recently departed to start his own company. When someone of LeCun’s stature walks out the door, you have to ask what’s going on behind the scenes. It suggests a deep-seated frustration or a lack of faith in the direction of travel.
Whilst Meta has successfully integrated its AI assistant into its social media apps, reaching “billions of users,” it has yet to produce a breakout, category-defining AI product like ChatGPT or Gemini. The pressure is immense. Meta is playing catch-up, and it’s doing so whilst trying to reorganise a fractious and arguably demoralised team. It’s like trying to redesign a Formula 1 car mid-race.

The Future is Visual (If You Can Build It)

If Meta can pull this off, the implications are staggering. Advanced visual world models are the key to unlocking the next wave of AI applications.
– True Embodied AI: Forget clumsy robots that need months of programming. Imagine robots that can learn to perform complex manufacturing, logistics, or even household tasks simply by watching a human do them. Embodied AI training becomes exponentially faster and more effective.
– Next-Generation Content Creation: Creative tools could move beyond simple image generation to creating entire dynamic video scenes with realistic physics and character interactions based on a simple prompt.
– Smarter Autonomous Systems: Self-driving cars could become far more robust, able to anticipate unpredictable pedestrian or vehicle behaviour not because they’ve seen that exact scenario a million times, but because they understand the general principles of movement and intent.
The success of these models hinges entirely on the sophistication of their action prediction architectures. This is the core intellectual property, the piece of the puzzle that separates a great AI company from a good one. It’s the engine of reasoning that Meta must perfect if it hopes to compete with the advances we see from Google’s Project Astra and OpenAI’s continuous model updates.

Can Meta Seize the World?

Meta’s ambition is undeniable. In ‘Mango’ and ‘Avocado’, it has a roadmap that targets the absolute cutting edge of AI research. These multimodal foundation models are precisely what is needed to push AI from a clever tool to a genuine partner in problem-solving.
But a roadmap is not a destination. The company faces a brutal competitive landscape and, more damagingly, significant internal dissent and talent flight. The departure of key figures like Yann LeCun raises serious questions about leadership and strategy. Meta is betting the farm on building a new world model, but it first needs to get its own house in order. The 2026 release date, as reported, feels a long way away in an industry that moves at light speed.
The race is on, and whilst the technology is fascinating, the human drama behind it might be even more compelling. What do you think? Is this a genuine masterstroke from Meta, or a desperate lunge from a company that’s already been outmanoeuvred? Let me know your thoughts below.

Hot topics

AI Business & Industry

AI Security & Risk

AI Money & Markets

AI Ethics, Regulation & Compliance

Why Meta’s Visual World Models Could Change AI Forever

So, What on Earth is a Visual World Model?

Meta’s Two-Pronged Attack: ‘Mango’ and ‘Avocado’

The Vision from the Top

A House in Disarray?

The Future is Visual (If You Can Build It)

Can Meta Seize the World?

Table of contents [hide]

Latest news

Must read

You might also likeRELATED

More from this authorEXPLORE