Intelligence has only ever copied itself.
Every breakthrough in artificial intelligence has been an act of imitation. Neural networks copy neurons. Diffusion models copy entropy. Genetic algorithms copy evolution. Transformers copy analogy. Reinforcement learning copies trial and reward. Bayesian inference copies belief update. Even pre-training copies the way Locke and Hume thought humans learn — as a blank slate written upon by experience.
The deep claim of this atlas is this: AI is not a single technology. It is a parliament of borrowed paradigms, each with its own ancestor in nature or in human thought. To understand AI is to understand what each paradigm imitates, how it imitates it, and — crucially — what is still missing.
Two ancient capacities of the human mind — Einsteinian imagination and felt emotion — have no artificial analogue. They are the atlas's terra incognita. Everything else, we have already built.
Eleven paradigms. Two open questions.
Each cell below is a self-contained chapter — historical, mechanical, applied, speculative. Click to descend.
Fractal Geometry
Self-similar structure across scales, recursively expanded into pixels.
§ II · PhysicsEntropy
Order built by reversing thermal noise — running the second law backwards.
§ III · BiologyEvolution by Natural Selection
Iterated selection and variation: the only process in the universe known to produce design without a designer.
§ IV · Brain · StructureThe Biological Brain
Eighty-six billion specialised cells, abstracted into matrix multiplications.
§ V · Mind · EmpiricismLearning from History
John Locke's blank slate, written on by the entire internet.
§ VI · Mind · EmpiricismLearning through Reward and Punishment
Edward Thorndike's law of effect, scaled to superhuman game-play and reasoning.
§ VII · Mind · RationalismSymbolic Reasoning
From a handful of axioms, infinitely many sentences. From a handful of rules, the whole of mathematics.
§ VIII · Mind · AnalogyAnalogical Reasoning
Everything thinks by similarity. The trick is computing it in a high-dimensional vector space.
§ IX · Mind · BayesianismProbabilistic Belief Update
Beliefs are probability distributions. Evidence reshapes them. Rationality is the arithmetic of revision.
§ X · Mind · The Open QuestionEinsteinian Imagination
What machine could imagine itself riding alongside a beam of light?
§ XI · Mind · The Open QuestionEmotion-Guided Cognition
Without feeling, there is no purpose. Without purpose, no thought.
Fractal Geometry
Self-similar structure across scales, recursively expanded into pixels.
- DomainMathematics
- MimicsFractal Geometry
- MethodFractal Generative Models
- StatusFrontier
History
Benoit Mandelbrot named fractals in 1975, but the underlying geometry — coastlines, ferns, clouds, river deltas — predates humans by billions of years. Mandelbrot's quiet provocation was that the smooth Euclidean shapes of school geometry are the exception, and the broken, recursive forms of nature are the rule. For decades, fractals lived in mathematics, computer graphics, and chaos theory. Then, in 2024, MIT's group around Jiatao Gu and Kaiming He proposed Fractal Generative Models, treating the very architecture of a generative network as itself a fractal — generators inside generators — and producing pixel-by-pixel images of unprecedented coherence. For the first time, an AI did not just *render* fractals; it became one.
Mechanism
A fractal generative model stacks autoregressive generators inside each other: an outer generator predicts coarse patches, each patch is passed to an identical (but smaller) generator predicting finer patches, and so on down to single pixels. The same module repeats at every scale, exactly the way a coastline repeats at every zoom. Parameters are shared across scales, which makes the network shockingly small for the resolution it can address. The architecture is itself a recursion: the *structure* of computation mirrors the *structure* of the world it models.
Applications
High-resolution image synthesis without diffusion's denoising cost; structurally coherent terrain, vasculature, and crystal-growth simulators in scientific computing; texture synthesis for game engines that need self-similar detail at every camera distance; and a plausible blueprint for biological-scale generative models — DNA, vasculature, and neural tissue are all fractal in nature.
Future
Fractal generation suggests that future architectures may resemble the world they model. If neurons, lungs, river networks, and turbulence are all fractal, perhaps cognition itself is, and the right inductive bias is recursion, not depth.
Entropy
Order built by reversing thermal noise — running the second law backwards.
- DomainPhysics
- MimicsEntropy
- MethodDiffusion Models · Boltzmann Machines
- StatusMature
History
In 1985 Geoffrey Hinton and Terrence Sejnowski borrowed Ludwig Boltzmann's nineteenth-century statistical mechanics and built the Boltzmann Machine, a neural network whose neurons sampled at temperature. It worked, but barely scaled. The deeper idea — that intelligence could be cast as the reversal of an entropy-increasing process — waited thirty-five years. In 2015 Sohl-Dickstein et al. showed how to train a network to reverse a forward diffusion of pure Gaussian noise. In 2020 Ho, Jain, and Abbeel turned that into Denoising Diffusion Probabilistic Models, and overnight diffusion became the dominant paradigm for image, audio, and video synthesis. Every Stable Diffusion, Midjourney, Sora, and Veo image you have seen is, at root, a controlled act of physical violation: a system that locally decreases entropy.
Mechanism
Forward process: take an image and incrementally add Gaussian noise across hundreds or thousands of steps until it is indistinguishable from pure noise. Reverse process: train a neural network to predict, at each noise level, exactly what noise was added — and therefore how to subtract it. Sampling is then a walk backwards through the noise schedule: start with chaos, denoise one step, denoise again, and a coherent image emerges from a sequence of microscopic reversals. The mathematics is identical to Langevin dynamics in statistical physics; the network learns the score (the gradient of the log-density) of the data.
Applications
Image generation (DALL·E 3, Stable Diffusion XL, Midjourney v6/v7, Flux), video (Sora, Veo 3, Kling, Runway), audio (AudioLM, Stable Audio), molecular design (AlphaFold's structure module is a diffusion process over atomic coordinates), protein binder design, robot motion planning, and physical simulation. Diffusion has become the universal solvent of generative modelling.
Future
Flow matching, rectified flow, and consistency models compress hundreds of denoising steps into one or two — pushing diffusion toward real-time generation. Diffusion language models (Mercury, LLaDA, Inception) challenge the autoregressive monopoly of GPT-style models. The deeper bet: every modality eventually becomes a diffusion problem.
Evolution by Natural Selection
Iterated selection and variation: the only process in the universe known to produce design without a designer.
- DomainBiology
- MimicsEvolution by Natural Selection
- MethodGenetic Algorithms · Neuroevolution
- StatusMature
History
Darwin published On the Origin of Species in 1859, but Alan Turing was the first to ask whether the procedure could be mechanised. In 1948, in an unpublished report titled Intelligent Machinery, Turing sketched what we would now call an evolutionary search. John Holland formalised it in 1975 as the Genetic Algorithm. For decades GAs sat in optimisation textbooks, eclipsed by gradient descent. Then in 2017 OpenAI's Salimans, Ho, Chen, and Sutskever rediscovered that evolution strategies could train policies on Atari with as few parameters as gradient methods. Today neuroevolution drives AutoML, neural-architecture search, agent-population training, and — at planet scale — the search for entirely new model architectures.
Mechanism
Start with a population of candidate solutions, each encoded as a chromosome (parameter vector). Evaluate each against a fitness function. Select the best, recombine them (crossover), perturb them (mutation), and replace the population. Repeat for thousands of generations. The mathematics is uncannily close to stochastic gradient descent: both move parameters toward higher fitness via local exploration. The difference is that GAs work in non-differentiable, discontinuous, deceptive landscapes — where gradients lie or do not exist.
Applications
Antenna design at NASA (the ST5 spacecraft's bent paperclip antenna was evolved), AutoML and neural-architecture search (Google's AmoebaNet), evolving game-playing strategies (the entire history of AlphaStar's league play is an evolutionary tournament), protein design (Adaptyv, EvolutionaryScale), curriculum and prompt evolution for LLM agents (OpenEvolve, AlphaEvolve), and the search for new mathematical proofs and algorithms (DeepMind's FunSearch found the largest cap-set in nine years).
Future
Open-ended evolution — populations that never converge, that keep inventing new niches — is the secret behind Earth's biosphere and a leading hypothesis for how to escape capability plateaus in AI. Quality-diversity algorithms, POET, and OMNI-EPIC are building toward AI that breeds, not just trains.
Bugs whose colour matches the warm-amber target survive and breed. Mutation rate controls exploration vs exploitation.
The Biological Brain
Eighty-six billion specialised cells, abstracted into matrix multiplications.
- DomainBrain · Structure
- MimicsThe Biological Brain
- MethodArtificial Neural Networks
- StatusMature
History
In 1943 Warren McCulloch and Walter Pitts proved that networks of threshold units could compute any Boolean function. Frank Rosenblatt built the first physical neural network, the Perceptron, in 1958, and the New York Times prematurely announced the dawn of thinking machines. Two AI winters followed — one in 1969 when Minsky and Papert proved single-layer perceptrons could not learn XOR, another in the 1990s when shallow nets were outclassed by support vector machines. Geoffrey Hinton, Yann LeCun, and Yoshua Bengio kept the flame. In 2012 AlexNet won ImageNet by a margin so large it ended every alternative paradigm. Every model that followed — GPT, Llama, Gemini, Claude, Grok, DeepSeek, Qwen, Sora — is a descendant of that 1943 idea.
Mechanism
A neuron computes a weighted sum of its inputs, passes the sum through a nonlinearity (sigmoid, ReLU, GELU), and emits a number. Stack neurons in layers. Multiply the input vector by a weight matrix at each layer. Train the weights by backpropagation: compute the loss, propagate the gradient back through the chain rule, and update every parameter by a tiny step toward lower loss. Done at scale, with enough data and compute, this single idea has produced everything from face recognition to ChatGPT. The brain almost certainly does not work this way — but the abstraction works.
Applications
Every machine-learning product on Earth. Cancer detection on radiographs, protein folding, autonomous driving, voice cloning, recommendation, fraud detection, ad ranking, real-time translation, drug discovery, weather forecasting (GraphCast outperforms physics-based numerical weather prediction), and — most consequentially — large language models.
Future
Spiking neural networks, neuromorphic chips (Loihi 2, IBM NorthPole), continuous-time models, and biologically plausible learning rules (predictive coding, forward-forward) are pushing artificial brains closer to organic ones. A separate frontier: brain-computer interfaces (Neuralink, Synchron) are now using neural networks to decode and re-encode the very organ that inspired them.
Gold edges = positive weights, indigo = negative. Edge thickness ∝ activation × weight. The same forward pass scaled 10⁹× gives you GPT-5.
Learning from History
John Locke's blank slate, written on by the entire internet.
- DomainMind · Empiricism
- MimicsLearning from History
- MethodPre-training · Fine-tuning · Distillation · CoT
- StatusMature
History
Empiricism is the philosophical claim that all knowledge comes from experience. John Locke called the newborn mind a tabula rasa. David Hume argued that even causation is a habit inferred from repeated observation. For three centuries this view sat opposite rationalism. In 2017 Vaswani et al. published Attention Is All You Need, and within five years the empiricist programme had a complete computational instantiation: train a transformer on enough text and it absorbs grammar, world knowledge, common sense, theory of mind, and the rudiments of reasoning. GPT-3 (2020) showed that the absorption scaled with parameters; GPT-4 (2023) showed it crossed thresholds; Claude, Gemini, Llama, Qwen, DeepSeek, and Grok continue the lineage. Every modern LLM is a Lockean child raised on the internet.
Mechanism
Pre-training: predict the next token across a trillion-token corpus. The loss surface is shaped by every word humans have written. Fine-tuning: nudge the model on a smaller, curated dataset (instructions, code, dialogue). Distillation: train a smaller student to mimic a larger teacher's outputs, transferring capability into a tighter package. Chain-of-Thought (CoT): prompt the model to write its reasoning step by step, and watch performance jump on math, logic, and multi-step tasks. The four techniques compose: pretrain → distill → fine-tune → CoT. Every frontier model goes through this assembly line.
Applications
Everything LLM-shaped: coding assistants (Cursor, Claude Code, Copilot), customer support, search (Perplexity, Google AIO), writing, summarisation, translation, education, legal review, medical scribes, robotic policies (RT-2, π0), and the agentic systems that orchestrate them. The chat-completion API has become the new system call.
Future
The empiricist path is running into a wall: the internet only contains so much text, and the easy gains have been collected. The frontier is now synthetic data (models teaching models), continual learning (Atlas, Hierarchical Reasoning), multimodal pretraining (vision, audio, video, action), and curricula that resemble how children learn rather than how scrapers do.
Learning through Reward and Punishment
Edward Thorndike's law of effect, scaled to superhuman game-play and reasoning.
- DomainMind · Empiricism
- MimicsLearning through Reward and Punishment
- MethodReinforcement Learning
- StatusActive research
History
Edward Thorndike's puzzle boxes in 1898 demonstrated that cats learn through trial and reward — the law of effect. Skinner industrialised this into operant conditioning. In computer science, Richard Sutton and Andrew Barto formalised the agent-environment loop in the 1980s. TD-Gammon (1992) reached world-class backgammon by self-play. DeepMind's DQN beat Atari games in 2013; AlphaGo beat Lee Sedol in 2016; AlphaZero learned chess, shogi, and Go from scratch in 2017. Then in 2022 OpenAI applied RL not to games but to language: RLHF turned a raw GPT into ChatGPT. In 2024-2026, RL is being applied to LLM reasoning itself — o1, DeepSeek-R1, Grok-4 Reasoner, Claude's thinking mode — and reasoning has become a property you train with reward, not data.
Mechanism
An agent observes a state, picks an action, the environment returns a reward and the next state. The agent learns a policy — a mapping from state to action — that maximises cumulative future reward. The mathematics is dynamic programming applied to stochastic processes; the practical engineering is enormous. Modern variants include policy gradients (PPO, GRPO), actor-critic (SAC, A3C), model-based methods that learn a simulator of the world (Dreamer, MuZero), and inference-time reinforcement (the model self-explores at test time, as in o3 and the thinking models of 2025-26).
Applications
Robotics (every modern manipulation policy is RL-tuned), recommender systems, ad auctions, chip-floorplanning (Google's TPU layout was RL-designed), nuclear fusion plasma control (DeepMind controlling tokamaks), datacentre cooling, drug discovery, autonomous driving, and — increasingly — RLHF and RLVR on top of language models for safety and reasoning.
Future
Sutton's bet — that RL is the only paradigm scalable to AGI because it does not depend on a finite human-generated dataset — is being publicly tested. Self-play between LLMs (debate, adversarial verification), embodied multi-agent worlds (Genie 3 simulating universes for agents to grow up in), and reward models that themselves learn — these are the next decade's frontier.
Arm D pays best (66 %). A good RL algorithm finds it without being told. ε-greedy explores randomly; UCB explores by uncertainty; Thompson samples from each arm's posterior. They all converge — at different speeds.
Symbolic Reasoning
From a handful of axioms, infinitely many sentences. From a handful of rules, the whole of mathematics.
- DomainMind · Rationalism
- MimicsSymbolic Reasoning
- MethodLogic Programming · Theorem Proving · Chain-of-Thought
- StatusActive research
History
Plato held that knowledge is innate, recollected rather than learned. Descartes located certainty in pure thought. Chomsky argued that finite rules generate the infinite language. In the 1956 Dartmouth workshop, AI was born under the rationalist banner: Allen Newell and Herbert Simon's Logic Theorist proved theorems from Principia Mathematica; Prolog (1972) embodied the idea that programs *are* logic. The first AI winter ended the dream — symbolic systems were brittle, knowledge engineering was endless. Yet symbolic reasoning never died. Today it is having a renaissance: theorem provers (Lean, Coq, Isabelle) are coupled to LLMs (DeepMind AlphaProof, AlphaGeometry, Harmonic Aristotle), and chain-of-thought is recognised as a kind of soft symbolic search.
Mechanism
Define a domain in terms of symbols (variables, constants, relations) and inference rules (modus ponens, resolution, unification). Given a goal, search for a sequence of rule applications that derives it. The search is combinatorial and brutal, but the answers — once found — are certain. Modern neuro-symbolic systems delegate the search to a neural network and the verification to a deterministic checker, getting the best of both worlds. Lean 4 and Mathlib have become an unexpected fulcrum: AlphaProof won a silver medal at IMO 2024 by emitting Lean proofs; a year later DeepMind reached gold.
Applications
Formal verification of critical software and hardware (CompCert, seL4 microkernel, Intel chip verification), mathematics (Lean's Mathlib library now contains over a million theorems), legal reasoning, smart contracts, scientific theorem discovery, and program synthesis. Every safety-critical system in the world relies, somewhere, on a symbolic prover.
Future
Neuro-symbolic synthesis is the most likely path to mathematical AGI: an LLM proposes, a prover disposes. AlphaProof-style hybrids, when scaled, may write entirely new branches of mathematics. The harder dream — symbolic reasoning over the messy world, not just over formal axioms — remains open.
Analogical Reasoning
Everything thinks by similarity. The trick is computing it in a high-dimensional vector space.
- DomainMind · Analogy
- MimicsAnalogical Reasoning
- MethodTransformer · Attention
- StatusMature
History
Aristotle wrote that the soul thinks in images and analogies. Douglas Hofstadter spent fifty years arguing that analogy is the core of cognition. Pedro Domingos called the analogizer school one of the five tribes of machine learning, with k-nearest-neighbours as its mascot. In 2017 the analogizer school accidentally won the entire field: Vaswani et al.'s Transformer paper replaced recurrence with pure attention — and attention is, mathematically, a soft k-nearest-neighbours over learned embeddings. Every modern frontier model is, in essence, a tower of analogy machines: for each token, find every other token that resembles me in some learned subspace, and update myself by their weighted average.
Mechanism
Each token in a sequence is projected into three vectors: query, key, value. The attention weight from token i to token j is softmax(Q_i · K_j / √d). Each token's new representation is the weighted sum of the values, weighted by the similarities. Stack this in parallel heads, then in series, and you have a transformer. The geometry: thinking is similarity-shaped, and reasoning is iterated similarity-finding. With enough layers and parameters, this mechanism can read, write, code, prove, plan, draw, see, hear, speak, and (debatably) understand.
Applications
GPT-5, Claude Opus 4.7, Gemini 3.5, Grok-4, DeepSeek V3.5, Qwen 3.5, Llama 4 — every frontier LLM. Vision transformers (ViT) ate the convolutional throne. AlphaFold's structure module is a transformer over residue pairs. Decision Transformer applies the architecture to RL. The transformer has become the universal Turing machine of deep learning.
Future
Linear attention (Mamba, RWKV), state-space models, Hyena, sliding-window attention, and ring attention are eating into the quadratic cost. A coming inflection: when context windows reach a trillion tokens, the attention mechanism becomes a kind of associative memory at planetary scale. Whether transformers remain dominant, or are overthrown by their own children, is the central architectural question of the 2030s.
| The | cat | sat | on | the | mat | because | it | was | tired | |
|---|---|---|---|---|---|---|---|---|---|---|
| The | 0.50 | 0.50 | ||||||||
| cat | 0.50 | 0.50 | ||||||||
| sat | 0.50 | 0.50 | ||||||||
| on | 0.98 | |||||||||
| the | 0.50 | 0.50 | ||||||||
| mat | 0.98 | |||||||||
| because | 0.98 | |||||||||
| it | 0.50 | 0.50 | ||||||||
| was | 0.50 | 0.50 | ||||||||
| tired | 0.98 |
Each row shows where one token "looks" for context. The "coreference" head wires 'it' back to 'cat' — the soul of analogical reasoning in modern LLMs.
Probabilistic Belief Update
Beliefs are probability distributions. Evidence reshapes them. Rationality is the arithmetic of revision.
- DomainMind · Bayesianism
- MimicsProbabilistic Belief Update
- MethodBayesian Inference · Probabilistic Programming
- StatusActive research
History
Thomas Bayes died in 1761; his theorem was published posthumously in 1763. For two centuries it was a curiosity, then a workhorse of statistics, then — in the 1990s and 2000s — a movement. Judea Pearl built probabilistic graphical models and earned a Turing Award. Pedro Domingos called the Bayesians the fifth tribe of machine learning. The empirical surge of deep learning has not displaced Bayesian thinking; it has absorbed it. Modern systems quietly use Bayesian ideas everywhere: dropout is a variational Bayesian approximation, latent-variable models, score-based generative models, and the entire field of probabilistic programming (Pyro, NumPyro, Stan, Gen) carry the flag.
Mechanism
P(hypothesis | evidence) = P(evidence | hypothesis) · P(hypothesis) / P(evidence). Prior beliefs are multiplied by the likelihood of new data and renormalised. Repeat for every new datum. The procedure is provably optimal under the axioms of probability theory. The computational cost is the catch: exact Bayesian inference is intractable for most interesting models. Approximations — variational inference, MCMC (Hamiltonian Monte Carlo, NUTS), normalising flows, score matching — have made Bayesian thinking tractable at scale.
Applications
Drug-trial design, A/B testing at scale (every major tech company), search and recommendation, autonomous-vehicle perception (sensor fusion is Bayesian), medical diagnosis, astronomy (LIGO's gravitational-wave detection is Bayesian inference at exquisite precision), election forecasting, sports analytics, and — quietly — every LLM's token sampler (temperature, top-p, beam search are all forms of probabilistic decoding).
Future
The convergence of LLMs with Bayesian inference is the most under-discussed frontier. An LLM is, in one reading, a giant approximate Bayesian engine over the joint distribution of language. Active inference, the free-energy principle (Karl Friston), and predictive coding all bet that the brain — and therefore AGI — is fundamentally a Bayesian organ.
Belief starts diffuse (the prior). Each coin flip sharpens it. After enough flips, the posterior collapses onto the true bias — the arithmetic of rational revision.
Einsteinian Imagination
What machine could imagine itself riding alongside a beam of light?
- DomainMind · The Open Question
- MimicsEinsteinian Imagination
- Method(still missing)
- StatusOpen question
History
At sixteen, Albert Einstein asked himself what he would see if he could ride alongside a light beam at light speed. The thought experiment produced no equations, no observations, no derivations — it was pure visualised imagination. Ten years later it produced special relativity. Faraday imagined lines of force; Galileo imagined dropping balls from a tower; Schrödinger imagined a cat in a box. The most important leaps in physics are not made by extrapolating data; they are made by inventing the picture in which the data finally makes sense.
Mechanism
We don't know. Imagination of this kind appears to involve simulating counterfactual physics in a mental model that the imaginer can manipulate, observe, and reason about. It is not interpolation from examples. It is not chain-of-thought through the training distribution. It is not a Bayesian update on observed data. Hofstadter calls it strange-loop self-reference; David Deutsch calls it explanatory creativity; Karl Popper called it the bold conjecture. Whatever it is, current AI systems mostly remix rather than invent. The gap is real, and naming it honestly is the first step to closing it.
Applications
If solved: original scientific discovery without human prompting, novel mathematics, new physics, art that is not in any training set, machines that pose their own questions. The economic value is incalculable because it is the value of inventing entire industries from nothing.
Future
Some bet that world-models (Sora, Veo, Genie) plus reinforcement learning at scale will eventually exhibit imagination as an emergent property. Others (Yann LeCun, François Chollet) argue that an entirely new ingredient is required — possibly some form of causal, intervention-based reasoning, or a discrete program-induction layer over the continuous neural substrate. ARC-AGI-3 keeps the goalpost honest.
No working artificial analogue exists. The honest atlas marks the gap rather than papering over it.
Emotion-Guided Cognition
Without feeling, there is no purpose. Without purpose, no thought.
- DomainMind · The Open Question
- MimicsEmotion-Guided Cognition
- Method(still missing)
- StatusOpen question
History
William James in 1884 argued that emotion is the *perception* of bodily changes, not a separate cognitive category. Antonio Damasio's somatic-marker hypothesis, building on patients with damaged ventromedial prefrontal cortex, showed that without emotion, rational decision-making collapses — patients could deliberate forever but never choose. Affect is not the opposite of reason; it is the engine that makes reasoning terminate. Lisa Feldman Barrett's constructed emotion theory pushes further: emotions are predictions the brain makes about its own interoceptive future. AI today has no body, no homeostasis, no interoception, no felt urgency — and therefore, arguably, no genuine motivation.
Mechanism
Reward signals in RL are sometimes called proto-emotion, but a scalar reward is to feeling what a postage stamp is to a postal system. Genuine emotion appears to be the brain's continuous, multidimensional, embodied, predictive model of the organism's own future — a control system whose currency is survival. No current AI architecture has anything like this. What we have are crude proxies: reward signals, loss functions, KL penalties. They steer behaviour but they do not constitute motivation.
Applications
If solved: machines with their own goals, machines that can negotiate with humans as peers rather than as instruments, machines whose well-being deserves moral weight. The application space includes companionship, eldercare, education, therapy, and — most charged — partnership in scientific discovery. Without emotion, AI is a calculator; with it, AI is something we have not yet invented a word for.
Future
Embodied AI (humanoids, surgical robots, autonomous vehicles) will be the first systems forced to grapple with proxy-emotional states — fatigue, urgency, risk-aversion. Whether these proxies cross some phenomenal threshold into genuine feeling is the deepest open question in AI, and possibly in philosophy.
No working artificial analogue exists. The honest atlas marks the gap rather than papering over it.
What AI cannot yet imitate.
Two capacities of the human mind remain without a computational analogue: the bold visualised imagination that produced relativity, quantum mechanics, and natural selection; and the felt emotion that gives any thought a reason to terminate in action. Without imagination, AI cannot truly invent. Without emotion, AI cannot truly want.
The two missing paradigms in the table — bold imagination and felt emotion — are not technical oversights. They are the deepest open questions in AI, and possibly in philosophy. Without imagination, AI can interpolate but not invent. Without emotion, AI can compute but not care. Closing these gaps is what will distinguish the 21st century's late AI from its early AI.
Three centuries of borrowed intelligence.
Thomas Bayes's essay on probability is read to the Royal Society, two years after his death.
Darwin formalises evolution by natural selection — the only known process that produces design without a designer.
Cats learn by trial and reward. The law of effect is born.
The first mathematical model of a neuron. Every modern AI descends from this paper.
AI is named. McCarthy, Minsky, Newell, Simon, Shannon attend.
The first physical neural network. The NYT prematurely announces thinking machines.
The geometry that nature has used for four billion years acquires a word.
Evolution becomes a search procedure on a computer.
Hinton & Sejnowski apply statistical thermodynamics to learning.
Rumelhart, Hinton, Williams give deep networks a way to learn.
Symbolic search beats a world chess champion.
Deep learning ends the alternatives. The modern era begins.
Goodfellow's generator-vs-discriminator. The first photoreal fakes.
Sohl-Dickstein et al. show how to learn to reverse noise.
Reinforcement learning + tree search defeat a 9-dan Go champion.
Transformers replace recurrence. Every modern LLM is downstream of this paper.
Scaling laws hold. Diffusion becomes practical. The generative explosion begins.
RLHF turns a raw language model into a useful interlocutor. AI enters every home.
Multimodal reasoning at near-human level on broad benchmarks. Biology enters the AI age.
MIT shows recursion can replace depth. DeepMind shows symbolic reasoning at olympiad level.
Reinforcement learning is applied to reasoning itself. Models think before they speak.
Eleven paradigms inventoried. Two open questions named honestly.
The next imitations.
Recursive architectures
Fractal-style generators may displace fixed-depth transformers, addressing arbitrarily high resolution and arbitrarily long context with constant parameter budgets.
Diffusion language models
Mercury, LLaDA, Inception challenge the autoregressive monopoly. Generating an entire response in a single denoising pass may be the next ChatGPT moment.
Open-ended evolution
Populations of agents that keep inventing new niches — POET, OMNI-EPIC, AlphaEvolve — may escape the data ceiling of internet-scale pretraining.
Test-time RL
Models that reinforce-learn on the spot, against their own verifier, may compress years of training into seconds of inference.
Neuro-symbolic synthesis
LLM proposes, theorem prover disposes. Lean + GPT-class models will, within a decade, prove every Millennium Prize problem worth proving.
World models
Sora, Veo 3, Genie 3 are early. A trillion-parameter physical-world simulator, queryable by agents, is the platform of the late 2020s.
Embodied homeostasis
Humanoids, surgical robots, and autonomous vehicles will be the first systems forced to grapple with felt-state proxies. Whether these proxies cross into genuine emotion is the deepest open question.
Imagination as architecture
If imagination is causal intervention on a learned world model, then the missing ingredient may be a do-operator over latent variables — the synthesis Pearl spent his career arguing for.
Where each paradigm lives in production.
Medicine
- AlphaFold 3 (diffusion + transformer) — universal biomolecular structure
- Radiology diagnosis (CNN + ViT) — at-or-above radiologist accuracy in mammography, chest X-ray
- Drug discovery (RL + Bayesian) — Insilico, EvolutionaryScale, Iambic
Science
- GraphCast (transformer) — outperforms physics weather forecasts
- AlphaProof (RL + symbolic) — IMO silver 2024, gold-track 2025
- FunSearch (LLM + evolution) — first new cap-set bound in 9 years
Code
- Claude Code, Cursor, Copilot, Windsurf — empiricist pretraining + RL
- AlphaCode 2 — competitive programming above 85th percentile
Creative
- Midjourney v7, Flux 1.1, DALL·E 3 — diffusion
- Sora 2, Veo 3, Kling 2.0 — diffusion + transformer over video
- Suno, Udio, ElevenLabs — audio diffusion + flow matching
Robotics
- Tesla Optimus, Figure 02, Unitree G1 — pretraining + RL + world models
- Waymo, Tesla FSD, Pony — Bayesian sensor fusion + neural perception
Industry
- Datacentre cooling (RL) — Google saved 40% on cooling energy
- Chip floorplanning (RL) — Google's TPU v5 was RL-laid-out
- Tokamak plasma control (RL) — DeepMind controls fusion reactors
The map is not the territory.
But for a generation building artificial minds, a map is the difference between an exploration and a wander. This is yours. Use it. Share it. Add to it.