News

The Silent Architecture That May Finally Teach Machines Common Sense

End Of Miles

24 Mar 2025 — 3 min read

Large language models have "absolutely no way" of achieving human-level intelligence despite their impressive capabilities, declares Yann LeCun, who instead has quietly guided Meta toward a fundamentally different AI architecture that eschews generation altogether.

This alternative path toward machine reasoning, writes End of Miles, represents perhaps the most significant architectural divergence from today's dominant AI paradigm—one that Meta's research teams have been refining for years while much of the industry pursued increasingly larger generative models.

Beyond Generation: Representation Over Recreation

LeCun's architecture, called JEPA (Joint Embedding Predictive Architecture), fundamentally rejects the generative approach that powers virtually every major AI system today. The Turing Award winner explains the profound difference in approach through the lens of physics:

"The problem of understanding the world does not go through the solution to generating video at the pixel level. I don't need to know if I take this cup of water and I spill it, I cannot entirely predict the exact path that the water will follow on the table and what shape it's going to take. But at a certain level of abstraction, I can predict that the water will spill." Yann LeCun, Meta Chief AI Scientist

The Meta scientist contends that current generative models—whether text, image, or video-based—fundamentally cannot develop true comprehension or reasoning capabilities precisely because they attempt to predict every detail. JEPA instead focuses on learning abstract representations that can make predictions at a conceptual level.

The Architecture Behind True Understanding

Unlike generative models that attempt to reconstruct or predict missing pixels, words, or tokens directly, JEPA takes an entirely different approach. The AI pioneer explains:

"We take the original full image and the corrupted version, we run them both through neural nets. Those neural nets produce representations of those two images—the initial one and the corrupted one—and we train another predictor to predict the representation of the full image from the representation of the corrupted one." LeCun

This methodology forces the system to develop abstract understanding rather than focusing on surface-level details. The Meta research team has already extended this approach beyond images to video, creating what they call "V-JEPA."

Detecting The Impossible

The Stanford-affiliated researcher's most compelling evidence for JEPA's potential comes from its ability to recognize physically impossible scenarios—a capability that has largely eluded generative AI despite massive scaling efforts.

"We have a ball rolling and it rolls and it starts behind a screen and then the screen comes down and the ball is not there anymore. You measure the prediction error of the system as you show the video, and when something impossible occurs, the prediction error goes through the roof." The AI specialist

This represents a significant step toward machines developing common sense understanding of how the physical world operates—what's possible and what isn't—without explicitly being told.

True Planning and Reasoning Emerge

The technology extends beyond passive understanding into the realm of action planning—one of the most crucial missing capabilities in current AI systems. Meta's research team has developed action-conditioned versions that can predict outcomes from sequences of actions.

"The model basically can say here is the state of the world at time T, here is an action you might take, I can predict the state of the world at time T plus one in this abstract representation space. Now you can imagine you can have the system imagine what would be the outcome of a sequence of actions. If you give it a goal saying I want the world to look like this at the end, can you figure out a sequence of actions to get me to that point? It can actually figure out by search a sequence of actions that will produce that result. That's planning. That's reasoning. That's actual reasoning and actual planning." LeCun

The significance of this breakthrough contrasts sharply with current agentic AI systems, which he characterizes as largely "regurgitating plans that the system has already been trained on," rather than genuinely reasoning through new problems.

When Will This Transform AI?

While refusing to offer specific commercialization timelines, he acknowledges the gap between research prototypes and deployed systems. "That's where things become complicated," the professor notes, suggesting that JEPA's core capabilities might begin to appear in practical systems within "three to five years."

This timeline coincides with what LeCun describes as large language models hitting "diminishing returns"—a point where simply adding more data and computation no longer yields significant improvements, ultimately necessitating fundamentally new approaches like JEPA.

While much of the industry continues its focus on generative models, Meta's research signals a potentially significant architectural shift—one that might finally bridge the gap between statistical pattern matching and genuine machine reasoning.