Today's smartest AI is a genius locked in a dark room — fed nothing but text slipped under the door.
That's the real issue with the "LLMs can't reach AGI" debate. People keep asking whether AI is intelligent enough. The better question is whether AI is fed enough.
What We're Actually Giving AI
A human being takes in the world constantly. Every second, you process depth, sound, facial expression, tone of voice, the weight of an object in your hand, the warmth of a room. You move through physical space and your brain stitches all of it into a coherent understanding of reality.
What do we give AI? Text. Occasionally an image. A thin slice of the world compressed into tokens.
We hand AI a few lines of text and wonder why it can't read the room. The answer isn't smarter models. It's richer input.
The Real Race
The next breakthrough in AI won't come from a larger parameter count or a cleverer training trick. It'll come from the team that cracks multimodal, real-world data at scale — vision, audio, 3D space, motion, video — and feeds it to agents fast enough to matter.
Whoever does that doesn't just build a better model. They build the infrastructure the entire next generation of AI runs on.
The Teams Already Feeding AI the World
A few companies have understood this early:
World Labs (Fei-Fei Li)
Teaching AI to see and reason in 3D space, not just read about it. Spatial intelligence means understanding where things are, how they move, and what happens when they interact — things text can never convey.
Meta's Project Aria
Sensor glasses that capture the world from a first-person, human point of view. This data trains AI and robots in environments that look like real life, not curated benchmarks.
Physical Intelligence
Models that learn from real robots folding laundry and bussing tables. When AI is trained on physical tasks in the real world, it learns how the world actually behaves — not just how humans describe it.
NVIDIA Cosmos
World models plus the data pipeline to feed physical AI at scale — the same infrastructure powering Tesla's self-driving stack and Waymo's autonomous fleet. Real data, simulated data, combined and served to agents at volume.
What This Means
This isn't a niche research story. It's a platform shift — the kind that produces trillion-dollar companies.
The team that wins won't necessarily have the most talented researchers or the most compute. They'll be the ones who figured out how to capture the physical world and feed it to AI faster than anyone else.
The next breakthrough won't be AI that's smarter. It'll be AI that can finally see.