Notes from the quest factory

Recently, I used an AI trained on fan­tasy novels to gen­erate custom sto­ries for about a thou­sand readers. The sto­ries were appeal­ingly strange, they came with maps (MAPS!), and they looked like this:

OMG the story I just received in the mail, gen­erated by @robinsloan’s AI pro­gram based on my interests, is SIMPLY DELIGHTFUL. Look at this banger of a first page. pic.twitter.com/3qXzrde2Oa

— Dan Cohen (@dancohen) June 8, 2019

Here, I want to share some notes that might be useful to other people doing sim­ilar projects, and/or people who imagine they might.

Okay — first I’ll do philosophy, then tech­nology. Feel free to skip ahead if you like.

I see what you did there

Honestly, I think the key to this project wasn’t the AI but the paper.

I’m very happy to have dis­cov­ered Lob, a ser­vice that allows you to print and mail things using code. How are these things printed? From where are they mailed? I have no idea, which is mildly disconcerting, but also mildly magical. I mean, this function — 

response = lob.letters.create({
  description: "Letter for #{purchase["email"]}",
  to: {
    name: purchase["ship_to_name"]
    # etc
  },
  from: {
    name: "Year of the Meteor"
    # etc
  },
  file: pdf_name,
  double_sided: true,
  mail_type: "usps_first_class"
  },
  {"Idempotency-Key" => purchase["email"]}
)

—sends a letter in the mail! For about a dollar! That’s wild!

Why did I want to print and mail these sto­ries? After all, I could have built a quest gen­er­ator on the web, acces­sible for free. A series of prompts; a map; a squirt of AI.

I could have, sure. And … then what?

People might have found their way to the page and laughed for a moment at what emerged. Snapped screenshots, posted them. And then: on to the next bauble! There’s no shortage. Per­haps you’ve crafted some of these baubles your­self. You might know this feeling.

Another day, another “I see what you did there.”

By contrast: because these sto­ries were deliv­ered physically, I have pho­tographs of let­ters in people’s front yards. In their houses. WITH THEIR DOGS.

I'm curious so I open the mys­te­rious enve­lope outside. Inside is a MAP! It's @robinsloan's 'Year of the Meteor' AI (neural net­work?) adventure. Curious dog not included. pic.twitter.com/ozdrG6hvMW

— Jonathan Fly (@jonathanfly) June 8, 2019

I was attracted to AI lan­guage models in the first place because they showed me sen­tences that had a strange and inef­fable flavor. It’s like Eng­lish as a second lan­guage, except the first isn’t Spanish or Swedish but rather, I don’t know, Martian. For someone who enjoys words, who likes it when sen­tences are weird and/or beau­tiful and/or unexpected, that’s obvi­ously appealing.

But, if that’s the appeal, then the chal­lenge is to get people to actu­ally READ THE SENTENCES. Not just appre­ciate the framing; not just nod at the tech­nology.

Upon encoun­tering these quests, did readers’ souls quiver? Did their eyes film with tears, blur­ring the text? Er, no. But some of them really did spend some time with their printouts. For me, that’s cru­cial; non-negotiable. “I see what you did there” is weak gruel. I am in this to have people read things.

My AI-gen­erated quest from @robinsloan is full of strange and won­drous things. My favorite is: "Fenris was a sev­ered king, a dwarf, not a dwarf..." I'm also fond of the horn­less uni­corn that was "a man with a horn painted on his chest..." Very good stuff indeed. pic.twitter.com/DoUn5FlZ3E

— Paul F. Olson (@pfolson) June 10, 2019

Okay, enough aes­thetic hand-wringing. Now for the nerdy stuff!

The skeleton

Here, I’ll out­line the process I used to gen­erate these quests.

Update: I gave a talk about these tech­niques at the Rogue­like Cel­e­bra­tion in October 2019. The recording offers a nice way to get this infor­ma­tion, with a spe­cial emphasis on the quality of the AI-gen­erated lan­guage.

My invi­ta­tion to par­tic­i­pate enticed about a thou­sand people to pay a few dol­lars and fill out a Google Form, spec­i­fying things like the name of their quest’s leader, the kind of arti­fact their questers sought, the species of crea­ture encoun­tered on the road — you know, quest essen­tials!

Even more essen­tial to a quest, perhaps, is a map.

AHHH I love it

Using Ryan Guy’s ter­rific Fantasy Map Gen­er­ator code, I churned out a few thou­sand maps, each dif­ferent, but/and also very sim­ilar to the one above. (And, let’s be real … these maps are the stars of the show. You can stop reading now.)

The place names all came from a tiny neural net­work trained on a selec­tion of real place names from world history. Reviewing the input file now, I see that I used lists of towns in England, Italy, France, Denmark, Japan, and ancient Rome. Neural net­works can work as blenders, mixing up struc­tures and phonemes in an appealing way. They are really, really good at names!

Next, down­loaded the quest design form responses. Using a Ruby script, each reader was assigned a map, and the place names on that map were com­bined with their responses to pro­duce a “story skeleton” that I could feed into the AI text gen­er­ator.

I need to pause here for a bit of background. The text gen­er­ator I used was GPT-2, a pow­erful lan­guage model devel­oped by San Francisco’s OpenAI. GPT-2 was ini­tially trained on many giga­bytes of text from the web. I con­tinued that training — “fine-tuning” the model — on sev­eral hun­dred megabytes of fan­tasy novels. My per­sonal GPT-2 now very strongly believes that most sen­tences ought to be about shadowy keeps and road-weary rangers. (I do not disagree.)

GPT-2’s code gives you the option to pro­vide “con­text.” Before you ask the model to gen­erate text, you can feed in a sequence of char­ac­ters to establish, basically, what’s going on in the story. If you do so, GPT-2 will duti­fully refer back to the names, the places, and, to a degree, the sit­u­a­tions included in that con­text. It doesn’t stay per­fectly consistent — any human writer could do better — but this is a capa­bility that has, until now, eluded AI lan­guage models entirely.

This notion of con­text was key to the quest gen­er­a­tion process. I would alter­nate between get­ting text out of GPT-2 and feeding prompts in from the story skeleton — in effect, guiding GPT-2 along a par­tic­ular path.

The Ruby code to pro­duce one story skeleton from a single reader’s map and form looked like this:

  prompt "#{format_for_start(survey[:group])} began \
          their quest in #{city1}, a city known for", 3

  prompt "This quest to defeat the Dark Lord \
          was led by #{survey[:leader]}, who", 2

  prompt "The questers sought #{survey[:seek]}, which", 3

  prompt "They intended to travel #{survey[:travel]}, \
          but, unfortunately,", 1

  prompt "Then, on the road toward #{city2}, \
          they encountered #{survey[:encounter]}. It", 3

  prompt "The questers crossed into the \
          country called #{country1}, known for", 2

  prompt "There, in #{country1}, the Dark Lord found them. He", 2

  prompt "The Dark Lord cruelly", 2

  prompt "Did their quest fail because the questers \
          desired only #{survey[:desire]}? Or was it", 2

  prompt "#{survey[:leader]}'s last thoughts were", 1

  prompt "The world was quiet.", 1, ""

If there’s any part of my process that’s even a little bit novel or interesting, this is it, so I want to pause and point out a few things.

First: I can specify how many sen­tences I want with the number that fol­lows the prompt text. This is a cru­cial artistic control! GPT-2 gen­erates a sequence of fixed length; you can’t ask it for “just two sen­tences, please.” But you can take the fixed-length sequence, break it into sen­tences your­self (simply split­ting it on periods works great), and then only use as many as you want.

Second: notice the words I use at the ends of the prompts. I am hardly an AI whisperer, but I do think I’ve learned a bit about nudging a lan­guage model towards interestingness. These sys­tems are, in general, very con­tent to just … hang out. They love to describe a scene, then re-describe it, and describe it again, rather than advance the plot with a twist or a turn. (In their defense: they don’t know what a plot is, or a twist, or a turn.) Notice, in the fourth prompt above, the “but, unfortunately,” which pro­duced reli­ably fun results. You can see that almost all of my prompts “set up” GPT-2 in this way. (And, by contrast, a dif­ferent ver­sion of this tem­plate without those guiding words pro­duced sto­ries with pal­pably less “going on.”)

Third: look closely at the final prompt. Notice the empty string at the end:

  prompt "The world was quiet.", 1, ""

As I was fid­dling with these prompts, my friend Dan pro­posed an idea: what if the text that GPT-2 received and the text the reader read were some­times dif­ferent? In the case above, what’s hap­pening is that GPT-2 is seeing the line “the world was quiet,” which will influ­ence the text it gen­erates; however, “the world was quiet” is not being shown to the reader. The reader is instead seeing … nothing. An empty string. So the reader sees only GPT-2’s response to “the world was quiet,” which in prac­tice goes some­thing like

No fires burned, and no lamps were lit.

or

Every so often, a breeze would rustle the trees and make them shimmer.

or

For a few moments, he thought he heard the dis­tant sound of an ancient love song.

I think that’s really lovely! There’s no need to preface those lines with “the world was quiet”; they com­mu­ni­cate that on their own. This tech­nique of showing text to GPT-2 that you con­ceal from the reader is a sneaky way of telling the system what you want. It’s the hidden agenda, the moon behind the clouds. I think it’s poten­tially very pow­erful, but/and I’ve only scratched the sur­face here.

The output of the code above was a text file that looked like this:

A pair of thieves began their quest in Easy, a city known for|3
This quest to defeat the Dark Lord was led by Fenris Tusk, who|2
The questers sought a lost grimoire, which|3
They intended to travel quickly, but, unfortunately,|1
Then, on the road toward Lod Herley, they encountered an elk. It|3
The questers crossed into the country called Hagerobonou, known for|2
There, in Hagerobonou, the Dark Lord found them. He|2
The Dark Lord cruelly|2
Did their quest fail because the questers desired only peace? Or was it|2
Fenris Tusk's last thoughts were|1
The world was quiet.|1|

After I’d gen­erated one of those files for each reader, how did I use it?

A Python script fed the file’s first prompt into GPT-2 as con­text, then asked it for a blast of text. Next, it fil­tered that text heavily: trun­cating to a desired number of sen­tences, as dis­cussed above; rejecting if wonky (for example, if it included the strings “www” or “http”); and, importantly, checking for words I would never use in my own writing. (For this, I relied on Darius Kazemi’s word­filter, bulked up with addi­tional words and phrases of my choosing. If you’re using a lan­guage model to gen­erate text that will be shown to humans other than you, you must include a step like this. For me, it was cru­cial to gen­erate a bunch of sto­ries, scout them for scenes or even just ~implications~ I found skeezy or upsetting, and then add fil­ters to reject that kind of con­tent. The stock word­filter wouldn’t have caught it all, and I wouldn’t have imag­ined it all, just sit­ting and speculating. I had to survey the output.)

When the Python script had text in hand that passed all those tests, it fed (1) the orig­inal prompt, (2) the text gen­erated in response, and (3) the next prompt back into GPT-2, all concatenated. In this way, the con­text grew and grew, always a mix­ture of reader-pro­vided prompts and GTP-2’s own “imagination,” so both could influ­ence the story as it unfolded.

The fin­ished quest was deposited into a plain text file, which another Ruby script trans­formed into a PDF, which yet another Ruby script sent to Lob for printing and mailing.

You can see an example of a fin­ished quest PDF here.

It finally came! An AI-gen­erated fic­tional quest and autogen­erated map from @robinsloan! Look at those Escher-esque fjords. LOOK AT THEM 😍

Deets: https://t.co/DfNbteBfee I just filled out a form and the AI did the rest, Mad-Lib style. Some choice snip­pets below, more in thread. pic.twitter.com/mY7EKUcxqU

— Becki Lee (@omgbeckilee) June 11, 2019

Unhelpful pumpkins

Let’s imagine it’s ten years from now, and the super-pow­erful lan­guage model called GPT-2000 can pro­duce an entire fan­tasy novel all on its own. It does a very com­pe­tent job, too! The plot is pretty cool, the char­ac­ters are fun, and every so often, there’s a truly beau­tiful sentence.

So what?

There’s no shortage of fan­tasy novels that meet those requirements. In fact, there are already more than (almost) any person can read. They’re avail­able very cheaply or even, if you have access to a public library, for free. So, the poten­tial of this tech­nology isn’t, like, “At last! Some­thing to read!”

What is it, then?

It’s odd to sit and look at this direc­tory of quest sto­ries I gen­erated. There’s more than a thou­sand; I’ll never read them all. When I want to read just one, how do I choose? Randomly, of course. How else?

Now, let’s say the direc­tory wasn’t just sto­ries but full-blown GPT-2000 fan­tasy novels, a thou­sand of them, each totally new, never before read by anyone! As I con­sider that possibility, I ask myself: is the feeling one of great bounty — like a well-stocked fan­tasy aisle at a library — or is it … some­thing else? I think maybe the direc­tory feels overwhelming, or numbing, or even horrifying.

Let’s say I want to read one of the GPT-2000 novels. Do I just choose a file randomly, as before? I’d be the only one to read that novel, ever. If it was great, there would be no one who I could talk about it with. If it was great, the novel just below it might be even better, but I’d never know.

Reading the tor­rent of text gen­erated by a lan­guage model, real­izing how much of it is, in fact, great — not whole novels worth, of course, or even whole sto­ries, but sen­tences and paragraphs, definitely; they’re cool and knotty and delightful — and then seeing that text disappear, scrolled away into oblivion, replaced by more text that’s mar­bled just as richly with greatness, you realize: there’s no shortage of great lan­guage. But great lan­guage isn’t what makes a story great. It isn’t what makes a story at all.

In the snippet below, the AI-gen­erated text is quite good — 

I’m glad this Quest Against the Dark Lord is only a simulation. It’s cold, and it smells bad. @robinsloan pic.twitter.com/fo9Jf4AX6L

— Mr. Velocipede (@mrvelocipede) June 10, 2019

—but it’s clear that the best thing on the page, the thing that makes it glow, is the part sup­plied by a person.

For as capable as GPT-2 and its off­shoots become, the thing that will make their output worthy of our atten­tion is UNHELPFUL PUMPKINS.

June 2019