The secret of lightness

June 17, 2026

We do not yet understand how to train language models! This seems obvious to me, because it ought to be possible — it will be possible — to produce a tight, capable “programmatic reasoner” with something like 30 billion parameters.

The famous Scaling Laws only describe transformer models — nobody knows what weird architectures are waiting out there in the universe, with different responses to compute, data, and more. Nobody knows what kind of clever training regimes might coax huge models into better (more compact) shapes.

A fair objection goes like this: Robin, remember that the human brain has hundreds of trillions of “parameters”, in the form of synapses. Our largest models haven’t even approached that scale yet. Do you want us to architect a beetle’s brain, or SUPERINTELLIGENCE?

(Before proceeding, Robin replies: well, I wouldn’t mind starting with the beetle … )

The obvious response to this objection is that language models aren’t brains. Contra the brain, they operate with both handicaps (e.g. power consumption) and advantages (e.g. speed). More than linearly “better” or “worse”, though, they are just different! And so we should expect different properties, different capabilities … different numbers.

Hanging over everything, the recognition: the day that this level of intelligence moves out to the edge — to laptops and iPhones and toaster ovens — is the day the business model for centralized AI collapses like a soufflé. Lo, the data centers rise … yet they could be emptied in a year by one idea, from one lab or garage. Wild to think about.

A true believer in the Scaling Laws doesn’t think such an idea is possible — that’s my sense of it, anyway. Maybe I’m mischaracterizing the position. But I believe in the one idea, the one garage; I’m with Calvino:

Were I to choose an auspicious image for the new millennium, I would choose [ … ] the sudden agile leap of the poet-philosopher who raises himself above the weight of the world, showing that with all his gravity he has the secret of lightness, and that what many consider to be the vitality of the times — noisy, aggressive, revving and roaring — belongs to the realm of death, like a cemetery for rusty old cars.

Of course, this is just a post by a child of the 20th century, to whom the prefix “giga-“ still sounds unspeakably plush. Even so: if you tell me you can’t fit a supercapable model, one poised comfortably on today’s performance frontier, into 30 billion parameters, I will tell you, try harder!

To the blog home page