Reasons-ing models
It often happens for me that, after I write something, the most important part turns out to be a stop along the way, an incidental phrase. Case in point:
In the warmup to my recent post about the foundational question of language models, I wrote about what the models are doing. I think “modeling text” undersells the mechanism. The systems I tinkered with back in the late 2010s “modeled text”; these new ones, while mechanically very similar, are qualitatively, viscerally different.
So, instead, I decided to say that
language models collate and precipitate all the diverse reasons for writing, across a huge swath of human activity and aspiration
and I chose the word “reasons” with care. In fact, I think the big language models “see through” the veil of text, into the diverse reasons humans had, and have, for producing it. It’s precisely the diversity of those reasons that make those models so capable —
Much has been made of next-token prediction, the hamster wheel at the heart of everything. (Has a simpler mechanism ever attracted richer investments?) But, to predict the next token, a model needs a probable word, a likely sentence, a virtual reason —
In this view, the emergence of super-capable new models is less about reasoning and more about “reasons-ing”: modeling the different things humans can want, along with the different ways they can pursue them … in writing.
Reasons-ing, not reasoning. Playful turns of this kind can seem airy and frivolous, entirely linguistic … and, okay, they usually are … but this one has changed the way I think about these models, so I offer it to you, frivolous or not.
Naturally, a language model’s reasons-ing is bounded by its training data. As it happens, a useful fraction of human desire and action is encoded in writing, produced by someone, somewhere, sometime. But of course this map of reasons is far from complete.
One can easily imagine a vast trove of video, showing humans doing all sorts of different things for different reasons. If it was sufficiently diverse, and if it could be processed, such a trove could also inform a process of reasons-ing, and the reasons would be different. Presently, both of those “if”s are very far off. What distinguishes text is its availability and tractability.
A good question might be, can language models develop and pursue truly new reasons for writing? Probably not. How do humans develop and pursue truly new reasons for writing? I’m not sure. I do know it’s one of the most interesting and important things humans can do. Think of the emergence of written law, the birth of the novel; think of double-entry bookkeeping, haiku. (It’s interesting to note the degree to which those things all connect to, and rely on, the physical world. I mean, maybe that’s essential: the reasons exist before, and/or closer to ground-floor reality than, the writing itself.)
I think this playful turn cuts both way. On one hand, it grants the language models richer internal universes, as they “see through” the veil of text into deeper causes, underlying reasons. On the other hand, it cautions you that the models can, for sure, fool you into thinking they’re reasoning, when they are only nimbly reasons-ing.
To the blog home page