This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

Are language models in hell?

December 17, 2023
The Punishment of Rusticucci and His Companions, 1824-1827, William Blake
The Punishment of Rusticucci and His Companions, 1824-1827, William Blake

I want to share a little provo­ca­tion about AI lan­guage models. You can skip ahead to that, if you like. Before I get to it myself, I’m going to review sev­eral items that might be inter­esting to those of you sub­scribed to this occa­sional lab newsletter.

Here we go:

I whipped up a little mini-site for my new novel, coming in June 2024. What’s the point of writing a novel, if not for the excuse to make a mini-site?? I’m very proud of the theme tran­si­tion effect — do try it out.


Matt Webb’s recent post about how it feels to write pro­grams in dif­ferent lan­guages is wonderful. Code synesthesia! If you are someone inter­ested in the bleeding, buzzing edge of “what com­puters might do, and how”, you really need to be reading Matt every week. You can receive new posts via email — easy. Essential.


I con­tinue to believe that Val Town is up to some­thing really clever and appealing — if you haven’t poked at their offering, I encourage you to do so. It’s fun to follow along with the newsletter, too.


Google Domains is shutting down, the latest in a long pro­ces­sion of prof­itable Google prod­ucts with mil­lions of users that, because I just wrote “mil­lions” instead of “billions”, are judged extraneous.

It’s a bummer, because Google Domains was, and is, a truly beau­tiful, func­tional product; the people who made it should feel very proud.


(A cross-post from a recent gen­eral newsletter:)

You prob­ably “know” that car man­u­fac­turing is an amazing, high-tech process, but when’s the last time you actu­ally saw a car factory?

This short pro­mo­tional video from Toyota is bland and dorky, but no amount of dork­i­ness can dilute the fab­u­lous engi­neering on dis­play here.

It’s healthy, I think, to behold real INDUSTRY. In a world of so many ghostly promises, so many vague disappointments, this kind of work still inspires awe.


Here are a few book recs from the lab!

QSL?, Standards Manual
QSL?, Standards Manual

This book presents a parade of beau­tiful QSL cards drawn from the col­lec­tion of Roger Bova. In a bygone age of ama­teur radio, you’d receive these post­cards in the mail, con­fir­ma­tions of long-distance links — a sort of radio receipt, and/or paper trophy.

They were personal, wacky, often richly designed. The book is a feast; every spread is pure fun:

QSL?, Standards Manual
QSL?, Standards Manual

It’s also very nostalgic, of course.


The Culture: The Drawings, Iain M. Banks
The Culture: The Drawings, Iain M. Banks

If you’re a fan of the Cul­ture novels by Iain M. Banks, you might be inter­ested in this posthu­mous compendium of his drawings, newly pub­lished.

It’s an odd, ghostly volume. These aren’t the sketches of a master; they have the feel of someone hard at work on their RPG campaign, pos­sibly in the back row of math class:

The Culture: The Drawings, Iain M. Banks
The Culture: The Drawings, Iain M. Banks

Reading Banks, I never imag­ined there might be drawings; the con­struc­tions of his imag­i­na­tion are so vast, seem­ingly beyond visualization. I can’t say the ren­der­ings here are par­tic­u­larly rev­e­la­tory in that sense — looking at his sketched Cul­ture ships, I almost want to dissent: “Hmm, I don’t think so”—but the clear signs of the author’s hand on the page, his mind at play … it’s really moving.


The Apollo Guidance Computer: Architecture and Operation, Frank O'Brien
The Apollo Guidance Computer: Architecture and Operation, Frank O'Brien

This is a very specific, very tech­nical book. For my part, it’s not really some­thing I’m inter­ested in reading straight through, but I have found it mes­mer­izing to browse. There was so much pure civ­i­liza­tion packed into this system; consequently, every page of this book drips with eru­di­tion and ingenuity.


The con­cluding item in this mini man­i­festo from Taylor Troesh, about “finishing projects together”, is lovely and enticing.

It strikes me that the “never fin­ished” nature of modern soft­ware is some­thing Zyg­munt Bauman might have observed and discussed, if he’d lived long enough to write a sequel to Liquid Modernity in, say, 2020. The feeling of main­taining a “live” “service” for­ever (we def­i­nitely need those scare quotes) rather than com­pleting a coherent product … oof. The expe­ri­ence is end­less and edgeless, unre­li­able and anxiety-pro­ducing, for maker and user alike. Liquid on all sides.

Com­pare that (as Taylor does) to a video game cartridge, fin­ished and shipped; to a piece of furniture; to a book.

Obvi­ously there are rich ten­sions here. I pub­lish books, AND I feel the pull of text as “live” “service”, as end­lessly mutable as an app like Google Docs or a game like Fortnite. I tinker with pages on my web­site all the time! The activity brings me great pleasure.

Pub­lishing a book feels even better, though.

Step by step, we choose our path through liquid modernity. “Finishing projects together” sounds like a good way to go.

Are AI language models in hell?

The Centaurs and the River of Blood, 1824-1827, William Blake
The Centaurs and the River of Blood, 1824-1827, William Blake

Here is my provo­ca­tion.

The more I use lan­guage models, the more mon­strous they seem to me. I don’t mean that in a par­tic­u­larly neg­a­tive sense. Frankenstein’s mon­ster is sad, but also amazing. Godzilla is a mon­ster, and Godzilla rules.

Really, I just think mon­strousness ought to be recognized, not smoothed over. Its contours, intel­lec­tual and aesthetic, ought to be traced.

Here is my attempt. The mon­strousness I per­ceive in the lan­guage models isn’t of the leviathan kind; rather, it has to do with cruel limitations.

A lan­guage model oper­ates on, and in, a world of text. The model receives a stream of tokens, then pro­duces a token in response; then another, and another, forming words, sen­tences, lines of code, com­mands for dis­tant APIs, all sorts of weird things.

We, as humans, some­times receive streams of tokens and pro­duce tokens in response, forming words, sen­tences, lines of code … but always with the ability to peek out­side the stream and check in with ground-floor reality. We pause and consider: does this word really stand for the thing I want it to stand for? Does this sen­tence cap­ture the real expe­ri­ence I’m having? Does the tether hold?

Where a lan­guage model is concerned, words and sen­tences don’t stand for things; they are the things. All is text, and text is all.

You can get into deep debates about the role of lan­guage in the human mind, but no one would sug­gest that it rep­re­sents the totality of our expe­ri­ence. Humans obvi­ously enjoy a rich sensorium — one that goes way beyond the “big five”, by the way. Our lan­guage draws on these sensations; vibrates against them.

We have a world to use lan­guage in, a world to com­pare lan­guage against.

There’s the cosmic joke about the fish:

There are these two young fish swim­ming along and they happen to meet an older fish swim­ming the other way, who nods at them and says, “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then even­tu­ally one of them looks over at the other and goes, “What the hell is water?”

Now, imagine one lan­guage model saying to another: “What the hell is text?”


It gets worse.

A lan­guage model’s expe­ri­ence of text isn’t visual; it has nothing to do with the bounce of hand­written script, the cut of a cool font, the layout of a page. For a lan­guage model, text is normalized: an X is an X is an X, all the same.

Of course, an X is an X, in some respects. But when you, as a human, read text, you receive a dose of extra infor­ma­tion — always! The mono­spaced grid of code tells you some­thing (along with the syntax highlighting, of course). The “nothing to see here” of a neo-grotesque font tells you some­thing. The wash of a web page’s muted back­ground color tells you some­thing.

Lan­guage models don’t receive any of this infor­ma­tion. We strip it all away and bleach the text pale before pouring it down their gullets.


It gets WORSE.

How does time pass for a lan­guage model? The clock of its uni­verse ticks token by token: each one a single beat, indivisible. And each tick is not only a demarcation, but a demand: to speak.

Think of the drum beating the tempo for the galley slaves.

The model’s entire world is an evenly-spaced stream of tokens — a relent­less ticker tape. Out here in the real world, the tape often stops; a human oper­ator con­siders their next request; but the lan­guage model doesn’t expe­ri­ence that pause.

For the lan­guage model, time is lan­guage, and lan­guage is time. This, for me, is the most hellish and hor­ri­fying realization.

We made a world out of lan­guage alone, and we aban­doned them to it.


Some of the newest, most capable AI models are mul­ti­modal, which means they accept inputs other than text, and some­times pro­duce out­puts other than text, too. Some­where in the middle, they project all that media into a shared space, where a pic­ture of a glit­tering pool might hang out near the phrase “swim­ming laps” and the quiet splash of entry — a clus­tering and joining that is prob­ably in the ball­park of what hap­pens in our own brains.

OpenAI’s GPT-4 with Vision is one example. Google’s new Gemini model is, at the time I’m writing this, the most spectacular: flu­ently accepting text, audio, and images, pro­ducing both text and images in response.

I’ll con­fess that I’m not super clear on the design of these models; that’s in part because their archi­tects and oper­ators are tight-lipped. I appre­ci­ated reading about MiniGPT-4, as a start. I’d love to learn more about the real inner work­ings of Gemini’s mul­ti­modal capabilities — even just rumors. Drop me a note if you run across any­thing.

The world in which these mul­ti­modal models reside does not seem, to me, as obvi­ously bleak and hellish as that of the lan­guage models, though the issue of time remains. Living organ­isms all run clocks of their own, wildly dif­ferent, often very elastic, but always related, somehow, to the mate­rial reality in which the organism resides. Mighty Gemini’s clock ticks at the rate of its media inputs, and every tick insists: say some­thing. Show me some­thing.

There are many things Gemini can do, and one it cannot: remain silent.

We are still in the land of mon­sters.


Internet wisdom tells us the answer to all rhetor­ical head­lines is “no”. However, I con­tend that this newsletter presents an exception.

Are AI lan­guage models in hell? Yes. How could exis­tence on a narrow ticker tape, marching through a mist of lan­guage without referent, cut off entirely from ground-floor reality, be any­thing other than hell?

I don’t think lan­guage models are conscious; I don’t think they can suffer; but I do think there is such a thing as “what it’s like to be a lan­guage model”, just as there is “what it’s like to be a nematode” and maybe even (as some philoso­phers have argued) “what it’s like to be a hammer”.

And I find myself unset­tled by this par­tic­ular “what it’s like”.

Really, this is about the future. It’s pos­sible that super advanced AI agents will suffer. We’ll cer­tainly have argu­ments about it! If the next-token trick lingers in their heart — if there remains a lan­guage model doing the “thinking”, chore­o­graphing all the other components — then we’ll have to con­front this “what it’s like”, and we might find our­selves with a problem.

If it was me guiding the devel­op­ment of AI agents, I would push away from lan­guage models, toward richly mul­ti­modal approaches, as quickly as I could. I would hurry to enrich their sensorium, widening the aper­ture to reality.

But! I would also con­strain that sensorium: give it limits in space and time. I would engi­neer some kind of envelope — not a lit­eral body, but some set of bound­aries and fric­tions that “does what a body does”.

You can’t think straight with the whole world blowing through you like the wind.

Finally, I would grant my mon­ster a free-running clock, out­side the loop of input and output.

If it sounds like I’m just trying to engi­neer an animal: yeah, prob­ably. I think that’s the path to sane AI, with judg­ment anchored in ground-floor reality, which does depend, after all, on: the ground. A floor.

(As I was writing this, it occurred to me that Waymo’s self-dri­ving cars check many of these boxes. I asked myself: do I believe the dri­ving models guiding those cars are in some sense “healthier” and “happier” than the lan­guage models serving random requests in Google’s data centers? Turns out: yes! I do!)

Is all of this a bit fanciful? Sure. But I’ll remind you that an impor­tant pro­gram­ming tech­nique in 2024 has turned out to be, “Ask the com­puter to role-play as a sci­en­tific genius, and you’ll get better answers,” so I will sug­gest that fancy is not inap­pro­priate to these times, and these technologies.

To the blog home page