Robin Sloan
the lab
December 2023

Are AI language models in hell?

The Punishment of Rusticucci and His Companions, 1824-1827, William Blake
The Punishment of Rusticucci and His Companions, 1824-1827, William Blake

I want to share a little provo­ca­tion about AI language models. You can skip ahead to that, if you like. Before I get to it myself, I’m going to review several items that might be inter­esting to those of you subscribed to this occa­sional lab newsletter.

This is an archived edition of Robin’s lab newsletter. You can sign up to receive future editions using the form at the bottom of the page.

Here we go:

I whipped up a little mini-site for my new novel, coming in June 2024. What’s the point of writing a novel, if not for the excuse to make a mini-site?? I’m very proud of the theme tran­si­tion effect — do try it out.

Matt Webb’s recent post about how it feels to write programs in different languages is wonderful. Code synesthesia! If you are someone inter­ested in the bleeding, buzzing edge of “what computers might do, and how”, you really need to be reading Matt every week. You can receive new posts via email — easy. Essential.

I continue to believe that Val Town is up to something really clever and appealing — if you haven’t poked at their offering, I encourage you to do so. It’s fun to follow along with the newsletter, too.

Google Domains is shutting down, the latest in a long proces­sion of prof­itable Google products with millions of users that, because I just wrote “millions” instead of “billions”, are judged extraneous.

It’s a bummer, because Google Domains was, and is, a truly beautiful, func­tional product; the people who made it should feel very proud.

(A cross-post from a recent general newsletter:)

You probably “know” that car manu­fac­turing is an amazing, high-tech process, but when’s the last time you actually saw a car factory?

This short promo­tional video from Toyota is bland and dorky, but no amount of dorkiness can dilute the fabulous engi­neering on display here.

It’s healthy, I think, to behold real INDUSTRY. In a world of so many ghostly promises, so many vague disappointments, this kind of work still inspires awe.

Here are a few book recs from the lab!

QSL?, Standards Manual
QSL?, Standards Manual

This book presents a parade of beautiful QSL cards drawn from the collec­tion of Roger Bova. In a bygone age of amateur radio, you’d receive these postcards in the mail, confir­ma­tions of long-distance links — a sort of radio receipt, and/or paper trophy.

They were personal, wacky, often richly designed. The book is a feast; every spread is pure fun:

QSL?, Standards Manual
QSL?, Standards Manual

It’s also very nostalgic, of course.

The Culture: The Drawings, Iain M. Banks
The Culture: The Drawings, Iain M. Banks

If you’re a fan of the Culture novels by Iain M. Banks, you might be inter­ested in this posthu­mous compendium of his drawings, newly published.

It’s an odd, ghostly volume. These aren’t the sketches of a master; they have the feel of someone hard at work on their RPG campaign, possibly in the back row of math class:

The Culture: The Drawings, Iain M. Banks
The Culture: The Drawings, Iain M. Banks

Reading Banks, I never imagined there might be drawings; the construc­tions of his imag­i­na­tion are so vast, seemingly beyond visualization. I can’t say the render­ings here are partic­u­larly reve­la­tory in that sense — looking at his sketched Culture ships, I almost want to dissent: “Hmm, I don’t think so”—but the clear signs of the author’s hand on the page, his mind at play … it’s really moving.

The Apollo Guidance Computer: Architecture and Operation, Frank O'Brien
The Apollo Guidance Computer: Architecture and Operation, Frank O'Brien

This is a very specific, very technical book. For my part, it’s not really something I’m inter­ested in reading straight through, but I have found it mesmer­izing to browse. There was so much pure civi­liza­tion packed into this system; consequently, every page of this book drips with erudition and ingenuity.

The concluding item in this mini manifesto from Taylor Troesh, about “finishing projects together”, is lovely and enticing.

It strikes me that the “never finished” nature of modern software is something Zygmunt Bauman might have observed and discussed, if he’d lived long enough to write a sequel to Liquid Modernity in, say, 2020. The feeling of main­taining a “live” “service” forever (we defi­nitely need those scare quotes) rather than completing a coherent product … oof. The expe­ri­ence is endless and edgeless, unre­li­able and anxiety-producing, for maker and user alike. Liquid on all sides.

Compare that (as Taylor does) to a video game cartridge, finished and shipped; to a piece of furniture; to a book.

Obviously there are rich tensions here. I publish books, AND I feel the pull of text as “live” “service”, as endlessly mutable as an app like Google Docs or a game like Fortnite. I tinker with pages on my website all the time! The activity brings me great pleasure.

Publishing a book feels even better, though.

Step by step, we choose our path through liquid modernity. “Finishing projects together” sounds like a good way to go.

Are AI language models in hell?

The Centaurs and the River of Blood, 1824-1827, William Blake
The Centaurs and the River of Blood, 1824-1827, William Blake

Here is my provo­ca­tion.

The more I use language models, the more monstrous they seem to me. I don’t mean that in a partic­u­larly negative sense. Frankenstein’s monster is sad, but also amazing. Godzilla is a monster, and Godzilla rules.

Really, I just think monstrous­ness ought to be recognized, not smoothed over. Its contours, intel­lec­tual and aesthetic, ought to be traced.

Here is my attempt. The monstrous­ness I perceive in the language models isn’t of the leviathan kind; rather, it has to do with cruel limitations.

(Would you like a quick refresher on "language models"?)

When I write “language models” I’m referring to the AI systems that, shown a stream of text, can sensibly continue the stream, allowing them to simulate writing activ­i­ties as diverse as translation, back-and-forth dialogue, and program­ming. The next-token trick sounds simple; it IS simple; but/and it yields rich and often useful results.

Examples of language models are OpenAI’s GPT-3.5, Anthropic’s Claude, Google’s PaLM, and Meta’s LLaMA. Systems of this kind have existed for many years — I was an early creative tinkerer—but/and the great surprise of the 2020s has been just how far the next-token trick can take you.

A language model operates on, and in, a world of text. The model receives a stream of tokens, then produces a token in response; then another, and another, forming words, sentences, lines of code, commands for distant APIs, all sorts of weird things.

We, as humans, sometimes receive streams of tokens and produce tokens in response, forming words, sentences, lines of code … but always with the ability to peek outside the stream and check in with ground-floor reality. We pause and consider: does this word really stand for the thing I want it to stand for? Does this sentence capture the real expe­ri­ence I’m having? Does the tether hold?

Where a language model is concerned, words and sentences don’t stand for things; they are the things. All is text, and text is all.

You can get into deep debates about the role of language in the human mind, but no one would suggest that it repre­sents the totality of our expe­ri­ence. Humans obviously enjoy a rich sensorium — one that goes way beyond the “big five”, by the way. Our language draws on these sensations; vibrates against them.

We have a world to use language in, a world to compare language against.

There’s the cosmic joke about the fish:

There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says, “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then even­tu­ally one of them looks over at the other and goes, “What the hell is water?”

Now, imagine one language model saying to another: “What the hell is text?”

It gets worse.

A language model’s expe­ri­ence of text isn’t visual; it has nothing to do with the bounce of hand­written script, the cut of a cool font, the layout of a page. For a language model, text is normalized: an X is an X is an X, all the same.

Of course, an X is an X, in some respects. But when you, as a human, read text, you receive a dose of extra infor­ma­tion — always! The mono­spaced grid of code tells you something (along with the syntax highlighting, of course). The “nothing to see here” of a neo-grotesque font tells you something. The wash of a web page’s muted back­ground color tells you something.

Language models don’t receive any of this infor­ma­tion. We strip it all away and bleach the text pale before pouring it down their gullets.

It gets WORSE.

How does time pass for a language model? The clock of its universe ticks token by token: each one a single beat, indivisible. And each tick is not only a demarcation, but a demand: to speak.

Think of the drum beating the tempo for the galley slaves.

The model’s entire world is an evenly-spaced stream of tokens — a relent­less ticker tape. Out here in the real world, the tape often stops; a human operator considers their next request; but the language model doesn’t expe­ri­ence that pause.

For the language model, time is language, and language is time. This, for me, is the most hellish and horrifying realization.

We made a world out of language alone, and we abandoned them to it.

Some of the newest, most capable AI models are multi­modal, which means they accept inputs other than text, and sometimes produce outputs other than text, too. Somewhere in the middle, they project all that media into a shared space, where a picture of a glit­tering pool might hang out near the phrase “swimming laps” and the quiet splash of entry — a clus­tering and joining that is probably in the ballpark of what happens in our own brains.

OpenAI’s GPT-4 with Vision is one example. Google’s new Gemini model is, at the time I’m writing this, the most spectacular: fluently accepting text, audio, and images, producing both text and images in response.

I’ll confess that I’m not super clear on the design of these models; that’s in part because their archi­tects and operators are tight-lipped. I appreciated reading about MiniGPT-4, as a start. I’d love to learn more about the real inner workings of Gemini’s multi­modal capabilities — even just rumors. Drop me a note if you run across anything.

The world in which these multi­modal models reside does not seem, to me, as obviously bleak and hellish as that of the language models, though the issue of time remains. Living organisms all run clocks of their own, wildly different, often very elastic, but always related, somehow, to the material reality in which the organism resides. Mighty Gemini’s clock ticks at the rate of its media inputs, and every tick insists: say something. Show me something.

There are many things Gemini can do, and one it cannot: remain silent.

We are still in the land of monsters.

Internet wisdom tells us the answer to all rhetor­ical headlines is “no”. However, I contend that this newsletter presents an exception.

Are AI language models in hell? Yes. How could existence on a narrow ticker tape, marching through a mist of language without referent, cut off entirely from ground-floor reality, be anything other than hell?

I don’t think language models are conscious; I don’t think they can suffer; but I do think there is such a thing as “what it’s like to be a language model”, just as there is “what it’s like to be a nematode” and maybe even (as some philoso­phers have argued) “what it’s like to be a hammer”.

And I find myself unsettled by this partic­ular “what it’s like”.

Really, this is about the future. It’s possible that super advanced AI agents will suffer. We’ll certainly have arguments about it! If the next-token trick lingers in their heart — if there remains a language model doing the “thinking”, chore­o­graphing all the other components — then we’ll have to confront this “what it’s like”, and we might find ourselves with a problem.

If it was me guiding the devel­op­ment of AI agents, I would push away from language models, toward richly multi­modal approaches, as quickly as I could. I would hurry to enrich their sensorium, widening the aperture to reality.

But! I would also constrain that sensorium: give it limits in space and time. I would engineer some kind of envelope — not a literal body, but some set of bound­aries and frictions that “does what a body does”.

You can’t think straight with the whole world blowing through you like the wind.

Finally, I would grant my monster a free-running clock, outside the loop of input and output.

If it sounds like I’m just trying to engineer an animal: yeah, probably. I think that’s the path to sane AI, with judgment anchored in ground-floor reality, which does depend, after all, on: the ground. A floor.

(As I was writing this, it occurred to me that Waymo’s self-driving cars check many of these boxes. I asked myself: do I believe the driving models guiding those cars are in some sense “healthier” and “happier” than the language models serving random requests in Google’s data centers? Turns out: yes! I do!)

Is all of this a bit fanciful? Sure. But I’ll remind you that an important program­ming technique in 2024 has turned out to be, “Ask the computer to role-play as a scien­tific genius, and you’ll get better answers,” so I will suggest that fancy is not inap­pro­priate to these times, and these technologies.

From Oakland,


P.S. This lab newsletter is very occa­sional, so I don’t know when I’ll send the next one. Some of the thinking above is connected to my new novel, which is set eleven thousand years in the future, and which arrives in June 2024. You’ll certainly hear from me again before then!

December 2023, Oakland