This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

The secret of lightness

June 17, 2026

We do not yet under­stand how to train lan­guage models! This seems obvious to me, because it ought to be possible — it will be possible — to pro­duce a tight, capable “programmatic reasoner” with some­thing like 30 bil­lion parameters.

The famous Scaling Laws only describe trans­former models — nobody knows what weird archi­tec­tures are waiting out there in the universe, with dif­ferent responses to compute, data, and more. Nobody knows what kind of clever training regimes might coax huge models into better (more compact) shapes.

A fair objec­tion goes like this: Robin, remember that the human brain has hun­dreds of tril­lions of “parameters”, in the form of synapses. Our largest models haven’t even approached that scale yet. Do you want us to archi­tect a beetle’s brain, or SUPERINTELLIGENCE?

(Before proceeding, Robin replies: well, I wouldn’t mind starting with the beetle … )

The obvious response to this objec­tion is that lan­guage models aren’t brains. Contra the brain, they operate with both hand­i­caps (e.g. power consumption) and advan­tages (e.g. speed). More than lin­early “better” or “worse”, though, they are just dif­ferent! And so we should expect dif­ferent properties, dif­ferent capabilities … dif­ferent numbers.

Hanging over everything, the recognition: the day that this level of intel­li­gence moves out to the edge — to lap­tops and iPhones and toaster ovens — is the day the busi­ness model for cen­tral­ized AI col­lapses like a soufflé. Lo, the data cen­ters rise … yet they could be emp­tied in a year by one idea, from one lab or garage. Wild to think about.

A true believer in the Scaling Laws doesn’t think such an idea is possible — that’s my sense of it, anyway. Maybe I’m mis­char­ac­ter­izing the position. But I believe in the one idea, the one garage; I’m with Calvino:

Were I to choose an aus­pi­cious image for the new millennium, I would choose [ … ] the sudden agile leap of the poet-philosopher who raises him­self above the weight of the world, showing that with all his gravity he has the secret of lightness, and that what many con­sider to be the vitality of the times — noisy, aggressive, revving and roaring — belongs to the realm of death, like a ceme­tery for rusty old cars.

Of course, this is just a post by a child of the 20th century, to whom the prefix “giga-“ still sounds unspeak­ably plush. Even so: if you tell me you can’t fit a super­ca­pable model, one poised com­fort­ably on today’s per­for­mance frontier, into 30 bil­lion parameters, I will tell you, try harder!

To the blog home page