World Models
The term "world model" has become one of the most overloaded phrases in AI. Yann LeCun uses it to describe a learned internal simulator — a system that can predict the next state of the world from the current one, operating in some abstract representation space. Fei-Fei Li and the spatial intelligence community mean something closer to a grounded physical model — one that understands objects, scenes, and the causal structure of three-dimensional reality. And large language models, whether or not anyone intended it, have developed their own kind of world model: a statistical projection of how concepts, entities, and events relate to one another, learned from the compressed residue of human communication.
These are all different things. They share a name and almost nothing else.
We think the useful question is not "which of these is the real world model?" but rather: what role does a world model actually play, and what properties must it have to be useful? Our answer is that a world model is a particular kind of compression — one that exploits the deep structural fact that reality is organized in layers, each with its own effective laws. That structure is what makes projection possible. And projection is what makes intelligence practical.
The world produces more data than any system can ingest
Every day, the world generates roughly 0.4 zettabytes of new data. Converted to tokens, that is on the order of 10^20 — over a million times larger than the training set of any frontier language model. Sensor data, financial transactions, scientific publications, satellite imagery, news, social media, government filings, supply chain signals — the list compounds daily. And this is not an artifact of our current moment. It is structural.
The gap between what the world produces and what any system can ingest is not closing. It is widening, permanently. As systems become more intelligent, they generate even more data — analyses, simulations, synthetic datasets, intermediate reasoning traces. Each new capability creates new data-production mechanisms. A frontier model's training set captures a vanishingly thin slice of the world's information at any given moment, and that slice shrinks in relative terms with every passing day. The world can never be fully compressed, because the mechanisms that produce new information are themselves proliferating. This is not a bottleneck to be solved by better hardware or larger context windows. It is a mathematical fact about the relationship between an open, evolving world and any finite system trying to represent it.
This has a direct consequence for what a "world model" can be: it cannot be a complete representation of the world. It is necessarily a projection — a lower-dimensional approximation that captures enough structure to be useful for some set of tasks. The question is not whether to project — you must — but whether you project wisely.
All world models are projections
Once you accept that completeness is impossible, the question becomes: what makes a projection good enough? And why should projection work at all?
There is a deep reason it works, and Philip Anderson articulated it in 1972. In "More Is Different," Anderson observed that the universe is not a single undifferentiated system — it is organized in layers, each governed by its own effective laws. Particle physics does not determine chemistry in any practical sense; chemistry does not determine biology; biology does not determine economics. At each level, new regularities emerge that are largely independent of the details below. You can model supply chain dynamics without modeling protein folding, because the economic level has its own causal structure — its own variables, its own attractors, its own phase transitions — that is screened off from the molecular level except at narrow boundary conditions.
This layered structure is not merely convenient. It is the reason world models are possible at all. If reality were a single entangled system with no natural decomposition into levels, projection would be hopeless — every approximation would be fatally incomplete. But because reality factorizes into semi-autonomous layers, a projection that captures the right level for your task can be extraordinarily effective.
A world model should have a first-order approximation of how things work across more than a narrow domain. This could mean understanding how economic systems behave — how supply chains respond to shocks, how capital flows shift under regulatory pressure, how technological adoption follows S-curves with predictable inflection points. The key property is that the model captures relationships between domains, not just facts within them. It must know where the layers interact: the points where a geopolitical decision reshapes a supply chain, or where a physical constraint bounds an economic trajectory.
This brings us to two related observations that are both true and both important:
Emergent phenomena can be identified without complete knowledge. Moore's law emerged from observing transistor counts over time — you did not need to simulate every atom in a silicon wafer to notice the pattern. Scaling laws in deep learning were discovered empirically. Ant colonies exhibit complex behavior from simple rules. A person knows they will be hungry tomorrow without modeling every cell in their body. Our internal research on emergence has found that this is not coincidence — the same mathematical structure recurs across wildly different substrates. Phase transitions in magnetic systems (Ising models), capability jumps in neural scaling, adoption cascades in markets — all share the same formal anatomy: control parameters, order parameters, critical exponents. The fact that identical abstractions apply across physics, biology, and economics is itself evidence that projection works. The world's layers share structural motifs even when they share no substrate.
Domain-specific reasoning does not require universal knowledge. To forecast how many chips will be produced in 2028, you do not need to know how to run a specific chemical experiment. You need to understand fab construction timelines, capital expenditure commitments, yield curves, and geopolitical constraints on equipment exports. The relevant knowledge is structured and bounded — it just spans multiple domains that are rarely combined. Anderson's insight explains why this is principled rather than merely pragmatic: the economic layer genuinely has its own laws.
But there is a harder truth lurking beneath this optimism. Stephen Wolfram's concept of computational irreducibility tells us that parts of the world are not just difficult to compress — they are mathematically impossible to compress. Even if you know the exact rules governing a system, predicting its behavior may require running the computation step by step, with no shortcut possible. The weather, the detailed evolution of an economy, the precise trajectory of a turbulent fluid — these are computationally irreducible. No model, no matter how powerful, can fast-forward them.
And yet we build weather forecasts and economic models that work. How? Because computational reducibility exists in pockets. The regularities we exploit — physical laws, seasonal patterns, mean-reverting economic relationships, demographic trends — are islands of compressibility in a sea of irreducibility. A world model's job is to find and exploit those pockets: to identify which aspects of a system admit compact, predictive descriptions and which must be treated as irreducible noise. The craft is in knowing the boundary.
Both observations point in the same direction: a useful world model is one that captures the right abstractions at the right level of granularity for the task at hand. It operates at Anderson's effective level, exploiting Wolfram's pockets of reducibility, while remaining honest about what lies outside them. It does not need to be deep everywhere. It needs to be deep where it matters and connected where things interact.
Intelligence is search and compression
We believe a simple claim: all intelligence is search and compression. And all compression can be phrased as a search question.
Consider a trivial example. If you observe the sequence 10101010..., you can compress it to (10)^n. This compression is also the result of a search: find the shortest description of this sequence across all possible descriptions. The compression is the search result.
This is not just a toy observation. It scales to every form of understanding. A scientific law is a compression of observed data. A market thesis is a compression of signals. A forecast is a compressed representation of possible futures, weighted by how the world is likely to evolve. In each case, the act of understanding is the act of finding a compact, predictive description — and finding that description requires searching through a vast space of possible descriptions.
What is remarkable is that this claim has recently moved from philosophical intuition to mathematical fact. A May 2025 paper (arXiv 2505.15784) formally proved that LLM training via next-token prediction computationally approximates Solomonoff induction — the theoretically optimal method for prediction under uncertainty, grounded in algorithmic information theory. The loss function that every language model minimizes is, viewed through the right lens, a program-length optimization: the model learns to assign higher probability to data that admits shorter descriptions. Compression is not a metaphor for what these systems do. It is literally the objective, and Solomonoff's framework tells us why that objective connects to intelligence.
The deeper implication is stark. Kolmogorov complexity — the length of the shortest program that produces a given output — is uncomputable. It reduces to the halting problem. You can never be certain you have found the shortest description; there might always be a shorter one. This means the search for better compression is provably inexhaustible. The recently proposed Kolmogorov Test (2025) makes this concrete: it defines intelligence as the ability to find shorter programs for data, and because the theoretical optimum is uncomputable, this is the only intelligence test that can never saturate. No system, no matter how capable, will ever complete the compression. There will always be a more compact description waiting to be found.
Viewed in this frame, a world model is not an end in itself. The goal of having a world model is to make search faster and better. A good world model constrains the search space. It tells you which descriptions are plausible before you evaluate them. It provides priors that eliminate most of the search space, letting you focus on the hypotheses that actually matter. And because the search for better compression is provably infinite, the world model itself must continuously evolve — there is no final version, only a current best approximation that is always improvable.
This is why cheap, continuously updating world models are so valuable. They are not oracles. They are context engines — systems that construct the right frame for a question before the question is answered.
Cheap world models as context engines
The practical implication is this: the quality of any AI system's output — whether it is a forecast, a decision recommendation, or a research synthesis — depends heavily on the context that informs it. Context is not just "retrieved documents." It is a structured representation of what is relevant, how things relate to each other, what has changed recently, and what the system is uncertain about.
A world model that is continuously updated with new information — new data, new events, new relationships between entities — functions as a living context engine. When a user asks a question, the world model does not answer the question directly. It constructs the right context for answering the question: the relevant facts, the relevant uncertainties, the relevant connections to adjacent domains.
There is a theoretical framework that makes precise what "right context" means. Karl Friston's active inference describes an agent that does not passively wait for information to arrive — it actively seeks out the observations that will maximally reduce its uncertainty about whatever matters for the task. This is "epistemic foraging": the system directs its attention toward the gaps in its model that, if filled, would most change its beliefs or decisions. A world model designed around this principle does not just store and retrieve — it identifies what it needs to know and goes looking for it. The chain we describe below is not a passive pipeline. It is, in its ideal form, an active inference loop.
This is the mechanism by which world models improve everything downstream:
- Better context construction leads to better retrieval — the system knows what to look for, not just what keywords to match. It forages epistemically, targeting the information that will most reduce decision-relevant uncertainty.
- Better retrieval leads to better reasoning — the model operates on more relevant, more complete information.
- Better reasoning leads to better search — the system explores the right hypothesis space and converges on more accurate, more calibrated outputs.
But there is a subtlety that our internal research has made vivid. When we query the same model with the same factual question using different output formats — logprob-based extraction, text-output probabilities, structured-output probabilities — the expressed beliefs diverge significantly. The format of the answer changes the model's expressed belief. This is not a minor implementation detail. It means that context construction must account not only for what information is provided but for how the model is asked to process and express it. The interface between world model and reasoning system is itself a variable that shapes the output, and getting it wrong can silently distort everything downstream.
The entire chain depends on the world model at the top. If the world model is stale, the context is stale, and everything downstream degrades. If the world model is cheap to update, the entire system stays current. And if the world model actively seeks the information it lacks — rather than passively waiting for updates — the entire system converges faster on what matters.
What we are building
Our goal at Eternis is to build a continuously evolving world model that functions as the foundation for decision-making at scale.
There is a theoretical result that frames the ambition. Marcus Hutter's AIXI — the mathematically optimal general agent — does not separate its world model from its planning system. In AIXI, the model of the world and the process of deciding what to do are aspects of a single optimization: the agent simultaneously maintains the best possible compression of its experience and selects actions that maximize expected reward under that compression. AIXI is uncomputable (it requires solving Kolmogorov complexity at every step), so no practical system can implement it. But the architectural lesson is real: the tighter the integration between world model and decision system, the closer to optimal. A world model that lives in one system and a planner that lives in another, communicating through a narrow interface, will always lose to a system where model and planner share representations and co-evolve.
This is not a general-purpose knowledge graph or a static embedding of Wikipedia. It is a system that ingests structured and unstructured information across domains — markets, geopolitics, technology, policy — and maintains a living, queryable representation of how these domains interact and evolve. It updates as the world changes. It knows what it doesn't know. And it is designed from the ground up to serve one purpose: making the downstream search for answers, forecasts, and decisions as fast and reliable as possible.
What the system is doing, at a formal level, is something analogous to renormalization. In physics, the renormalization group is a technique for integrating out irrelevant microscopic details to find the effective description at the scale that matters — you do not need to track every molecule to understand fluid dynamics, because the relevant behavior at the macro scale is captured by a handful of effective parameters. Recent work (ICLR 2025) has established rigorous connections between the layer-wise abstraction that deep networks perform and renormalization group flow: each layer integrates out finer-grained details, progressively building representations at the scale appropriate for the task. Our world model does the same thing for decision-relevant information. It takes the full, irreducibly complex stream of world data and integrates out what does not matter for the decision at hand, producing an effective description at the right scale — not too coarse to miss the signal, not too fine to drown in noise.
Every product we build — from our forecasting models to Axion — draws on this world model as its foundation. The forecaster uses it for context assembly. Axion uses it to construct decision-relevant evidence. The quality framework we published on forecaster evaluation is, in part, a way to measure whether the world model is doing its job — whether the projection preserves the structure that matters and discards what does not.
The world will always produce more data than can be ingested — a million times more, and the ratio is growing. The question is whether you have a system that can project that data into the right shape, at the right time, for the decision that matters. A system that finds the pockets of compressibility in a computationally irreducible world. A system that knows which layer of reality to operate at, and when to cross between layers. That is the world model problem. And it is what we are working on.
