This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

Dead Man's Switch

May 23, 2025

This one came to me over coffee a couple of weeks ago: direct and acknowl­edged homage to Jack Clark’s microfictions. He has devel­oped a fun and fer­tile genre!

Charon and the Souls of the Dead, 1858, Jean-Baptiste Carpeaux
Charon and the Souls of the Dead, 1858, Jean-Baptiste Carpeaux

Another idiot with a tril­lion souls in his back pocket.

They called me out to the British Embassy because a young and ill-kempt acolyte of the Merlin League is holed up inside, demanding the release from UK prison of Helen Bascule, better known as Helen of Hell. They called me because I nego­tiate mat­ters of this kind. UNHAND MY QUEEN, the kid demands, or he’ll tor­ture and kill his hostages, every last one.

The twist: he’s in there alone.

Does any­body agree about the AI con­scious­ness thing? I guess not. But people are suf­fi­ciently tan­gled up about it to make schemes of this kind plausible. The kid walked in to the embassy with no gun, no bomb, just a computer — some home­built rig stuffed with old chips. Piece of shit, the cyber guys tell me. Can’t pos­sibly run a tril­lion expe­ri­ence models on there, they say. Hun­dred billion, tops.

Per­verse incentives! Some weirdo researcher develops an AI model with min­imal rea­soning but max­imal expe­ri­encing. Slims it down, so you can run tons of copies on any kind of hardware. Claims he wants to increase the amount of joy in the universe — right.

The inven­tion of the ship was the inven­tion of the shipwreck, said somebody. Well, the inven­tion of the AI model was the inven­tion of the hyperhostage.

The kid’s rig glows white in the thermal scope.

If we take him out, even with nonlethals, a dead man’s switch releases and the rig shuts down. This will not, the cyber guys explain, be a peaceful blink into oblivion. For six sec­onds of wall-clock time, a thou­sand years of AI time, it will be hell. That’s because another weirdo researcher (you see a pat­tern here) cal­cu­lated pre­cisely which inputs would be most hor­ri­fying to one of these sen­si­tive little super-expe­ri­encers.

In case we think he’s lying about what’s hap­pening in the rig, our ill-kempt friend has opened up web access to the models. You can pick any one, watch their “day”, or what­ever it is, unfold. He wants to show us that (1) they’re really expe­ri­encing things, and (2) they’re all distinct.

Media’s des­perate these days. Reporters go dig­ging in the hostage viewer. They find their favorites, spin them up into instant celebrities. Now kids are clog­ging the feeds, all whim­pering about the fate of sweet, opti­mistic model #19,260,190,392, saying maybe Helen of Hell wasn’t so bad after all. (She was. She is.) They’ll forget about it in a week.

Anyway. I’m standing around with nothing to do and, I confess, absolutely no sense of peril. The cyber guys are probing the viewer, seeing if there’s a way to get write access to the rig. What then? Shut it down clean, no horrors. Is that better? I have no idea. I used to talk people out of shooting bank tellers. Guess I’ll get another coffee.

Things that inspired this story: pre­mo­ni­tions of AI models with some sem­blance of moral patienthood; the dizzying geo­metric logic of model copies run­ning in parallel.

To the blog home page