Notes from the Quest Factory

Recently, I used an AI trained on fan­tasy nov­els to gen­er­ate cus­tom stories for about a thou­sand readers. The sto­ries were appeal­ingly strange, they came with maps (MAPS!), and they looked like this:

OMG the story I just received in the mail, generated by @robinsloan’s AI program based on my interests, is SIMPLY DELIGHTFUL. Look at this banger of a first page. pic.twitter.com/3qXzrde2Oa

— Dan Cohen (@dancohen) June 8, 2019

Here, I want to share some notes that might be use­ful to other people doing sim­i­lar projects, and/or peo­ple who imag­ine they might.

Okay — first I’ll do philosophy, then tech­nol­ogy. Feel free to skip ahead if you like.

I see what you did there

Honestly, I think the key to this project wasn’t the AI but the paper.

I’m very happy to have dis­cov­ered Lob, a ser­vice that allows you to print and mail things using code. How are these things printed? From where are they mailed? I have no idea, which is mildly disconcerting, but also mildly magical. I mean, this function — 

response = lob.letters.create({
  description: "Letter for #{purchase["email"]}",
  to: {
    name: purchase["ship_to_name"]
    # etc
  },
  from: {
    name: "Year of the Meteor"
    # etc
  },
  file: pdf_name,
  double_sided: true,
  mail_type: "usps_first_class"
  },
  {"Idempotency-Key" => purchase["email"]}
)

—sends a let­ter in the mail! For about a dollar! That’s wild!

Why did I want to print and mail these sto­ries? After all, I could have built a quest gen­er­a­tor on the web, acces­si­ble for free. A series of prompts; a map; a squirt of AI.

I could have, sure. And … then what?

People might have found their way to the page and laughed for a moment at what emerged. Snapped screenshots, posted them. And then: on to the next bauble! There’s no short­age. Per­haps you’ve crafted some of these baubles your­self. You might know this feel­ing.

Another day, another “I see what you did there.”

By contrast: because these sto­ries were deliv­ered physically, I have pho­tographs of let­ters in peo­ple’s front yards. In their houses. WITH THEIR DOGS.

I'm curious so I open the mysterious envelope outside. Inside is a MAP! It's @robinsloan's 'Year of the Meteor' AI (neural network?) adventure. Curious dog not included. pic.twitter.com/ozdrG6hvMW

— Jonathan Fly (@jonathanfly) June 8, 2019

I was attracted to AI language mod­els in the first place because they showed me sentences that had a strange and inef­fa­ble flavor. It’s like Eng­lish as a sec­ond lan­guage, except the first isn’t Span­ish or Swedish but rather, I don’t know, Martian. For some­one who enjoys words, who likes it when sen­tences are weird and/or beau­ti­ful and/or unexpected, that’s obviously appeal­ing.

But, if that’s the appeal, then the chal­lenge is to get peo­ple to actu­ally READ THE SENTENCES. Not just appre­ci­ate the framing; not just nod at the tech­nol­ogy.

Upon encoun­ter­ing these quests, did readers’ souls quiver? Did their eyes film with tears, blur­ring the text? Er, no. But some of them really did spend some time with their printouts. For me, that’s cru­cial; non-negotiable. “I see what you did there” is weak gruel. I am in this to have peo­ple read things.

My AI-generated quest from @robinsloan is full of strange and wondrous things. My favorite is: "Fenris was a severed king, a dwarf, not a dwarf..." I'm also fond of the hornless unicorn that was "a man with a horn painted on his chest..." Very good stuff indeed. pic.twitter.com/DoUn5FlZ3E

— Paul F. Olson (@pfolson) June 10, 2019

Okay, enough aes­thetic hand-wringing. Now for the nerdy stuff!

The skeleton

Here, I’ll out­line the process I used to gen­er­ate these quests.

Update:I gave a talk about these tech­niques at the Rogue­like Cel­e­bra­tion in Octo­ber 2019. The record­ing offers a nice way to get this infor­ma­tion, with a spe­cial empha­sis on the qual­ity of the AI-gen­er­ated lan­guage.

My invi­ta­tion to par­tic­i­pate enticed about a thou­sand peo­ple to pay a few dol­lars and fill out a Google Form, spec­i­fy­ing things like the name of their quest’s leader, the kind of arti­fact their questers sought, the species of crea­ture encoun­tered on the road — you know, quest essen­tials!

Even more essen­tial to a quest, perhaps, is a map.

AHHH I love it

Using Ryan Guy’s ter­rific Fantasy Map Gen­er­a­tor code, I churned out a few thou­sand maps, each dif­fer­ent, but/and also very sim­i­lar to the one above. (And, let’s be real … these maps are the stars of the show. You can stop reading now.)

The place names all came from a tiny neural network trained on a selec­tion of real place names from world history. Review­ing the input file now, I see that I used lists of towns in England, Italy, France, Denmark, Japan, and ancient Rome. Neural net­works can work as blenders, mix­ing up struc­tures and phonemes in an appeal­ing way. They are really, really good at names!

Next, down­loaded the quest design form responses. Using a Ruby script, each reader was assigned a map, and the place names on that map were com­bined with their responses to pro­duce a “story skele­ton” that I could feed into the AI text gen­er­a­tor.

I need to pause here for a bit of background. The text gen­er­a­tor I used was GPT-2, a pow­er­ful lan­guage model devel­oped by San Francisco’s OpenAI. GPT-2 was ini­tially trained on many giga­bytes of text from the web. I continued that training — “fine-tuning” the model — on sev­eral hun­dred megabytes of fan­tasy nov­els. My per­sonal GPT-2 now very strongly believes that most sen­tences ought to be about shad­owy keeps and road-weary rangers. (I do not disagree.)

GPT-2’s code gives you the option to pro­vide “context.” Before you ask the model to gen­er­ate text, you can feed in a sequence of char­ac­ters to establish, basically, what’s going on in the story. If you do so, GPT-2 will duti­fully refer back to the names, the places, and, to a degree, the sit­u­a­tions included in that con­text. It doesn’t stay per­fectly consistent — any human writer could do better — but this is a capa­bil­ity that has, until now, eluded AI lan­guage mod­els entirely.

This notion of con­text was key to the quest gen­er­a­tion process. I would alter­nate between get­ting text out of GPT-2 and feed­ing prompts in from the story skele­ton — in effect, guid­ing GPT-2 along a particular path.

The Ruby code to pro­duce one story skele­ton from a sin­gle reader’s map and form looked like this:

  prompt "#{format_for_start(survey[:group])} began \
          their quest in #{city1}, a city known for", 3

  prompt "This quest to defeat the Dark Lord \
          was led by #{survey[:leader]}, who", 2

  prompt "The questers sought #{survey[:seek]}, which", 3

  prompt "They intended to travel #{survey[:travel]}, \
          but, unfortunately,", 1

  prompt "Then, on the road toward #{city2}, \
          they encountered #{survey[:encounter]}. It", 3

  prompt "The questers crossed into the \
          country called #{country1}, known for", 2

  prompt "There, in #{country1}, the Dark Lord found them. He", 2

  prompt "The Dark Lord cruelly", 2

  prompt "Did their quest fail because the questers \
          desired only #{survey[:desire]}? Or was it", 2

  prompt "#{survey[:leader]}'s last thoughts were", 1

  prompt "The world was quiet.", 1, ""

If there’s any part of my process that’s even a lit­tle bit novel or interesting, this is it, so I want to pause and point out a few things.

First: I can spec­ify how many sentences I want with the num­ber that fol­lows the prompt text. This is a cru­cial artis­tic control! GPT-2 gen­er­ates a sequence of fixed length; you can’t ask it for “just two sentences, please.” But you can take the fixed-length sequence, break it into sen­tences your­self (simply split­ting it on peri­ods works great), and then only use as many as you want.

Second: notice the words I use at the ends of the prompts. I am hardly an AI whisperer, but I do think I’ve learned a bit about nudg­ing a lan­guage model towards interestingness. These sys­tems are, in general, very con­tent to just … hang out. They love to describe a scene, then re-describe it, and describe it again, rather than advance the plot with a twist or a turn. (In their defense: they don’t know what a plot is, or a twist, or a turn.) Notice, in the fourth prompt above, the “but, unfortunately,” which pro­duced reli­ably fun results. You can see that almost all of my prompts “set up” GPT-2 in this way. (And, by contrast, a dif­fer­ent ver­sion of this tem­plate with­out those guid­ing words pro­duced sto­ries with pal­pa­bly less “going on.”)

Third: look closely at the final prompt. Notice the empty string at the end:

  prompt "The world was quiet.", 1, ""

As I was fid­dling with these prompts, my friend Dan pro­posed an idea: what if the text that GPT-2 received and the text the reader read were some­times dif­fer­ent? In the case above, what’s hap­pen­ing is that GPT-2 is seeing the line “the world was quiet,” which will influ­ence the text it gen­er­ates; however, “the world was quiet” is not being shown to the reader. The reader is instead see­ing … nothing. An empty string. So the reader sees only GPT-2’s response to “the world was quiet,” which in prac­tice goes some­thing like

No fires burned, and no lamps were lit.

or

Every so often, a breeze would rus­tle the trees and make them shimmer.

or

For a few moments, he thought he heard the dis­tant sound of an ancient love song.

I think that’s really lovely! There’s no need to pref­ace those lines with “the world was quiet”; they com­mu­ni­cate that on their own. This tech­nique of show­ing text to GPT-2 that you con­ceal from the reader is a sneaky way of telling the sys­tem what you want. It’s the hid­den agenda, the moon behind the clouds. I think it’s poten­tially very pow­er­ful, but/and I’ve only scratched the surface here.

The out­put of the code above was a text file that looked like this:

A pair of thieves began their quest in Easy, a city known for|3
This quest to defeat the Dark Lord was led by Fenris Tusk, who|2
The questers sought a lost grimoire, which|3
They intended to travel quickly, but, unfortunately,|1
Then, on the road toward Lod Herley, they encountered an elk. It|3
The questers crossed into the country called Hagerobonou, known for|2
There, in Hagerobonou, the Dark Lord found them. He|2
The Dark Lord cruelly|2
Did their quest fail because the questers desired only peace? Or was it|2
Fenris Tusk's last thoughts were|1
The world was quiet.|1|

After I’d gen­er­ated one of those files for each reader, how did I use it?

A Python script fed the file’s first prompt into GPT-2 as con­text, then asked it for a blast of text. Next, it fil­tered that text heavily: trun­cat­ing to a desired num­ber of sen­tences, as dis­cussed above; reject­ing if wonky (for exam­ple, if it included the strings “www” or “http”); and, importantly, check­ing for words I would never use in my own writing. (For this, I relied on Dar­ius Kazemi’s wordfilter, bulked up with addi­tional words and phrases of my choosing. If you’re using a lan­guage model to generate text that will be shown to humans other than you, you must include a step like this. For me, it was cru­cial to gen­er­ate a bunch of sto­ries, scout them for scenes or even just ~implications~ I found skeezy or upsetting, and then add fil­ters to reject that kind of con­tent. The stock word­fil­ter wouldn’t have caught it all, and I wouldn’t have imag­ined it all, just sit­ting and speculating. I had to sur­vey the out­put.)

When the Python script had text in hand that passed all those tests, it fed (1) the orig­i­nal prompt, (2) the text gen­er­ated in response, and (3) the next prompt back into GPT-2, all concatenated. In this way, the con­text grew and grew, always a mix­ture of reader-pro­vided prompts and GTP-2’s own “imagination,” so both could influ­ence the story as it unfolded.

The fin­ished quest was deposited into a plain text file, which another Ruby script trans­formed into a PDF, which yet another Ruby script sent to Lob for print­ing and mailing.

You can see an exam­ple of a fin­ished quest PDF here.

It finally came! An AI-generated fictional quest and autogenerated map from @robinsloan! Look at those Escher-esque fjords. LOOK AT THEM 😍

Deets: https://t.co/DfNbteBfee I just filled out a form and the AI did the rest, Mad-Lib style. Some choice snippets below, more in thread. pic.twitter.com/mY7EKUcxqU

— Becki Lee (@omgbeckilee) June 11, 2019

Unhelpful pumpkins

Let’s imag­ine it’s ten years from now, and the super-pow­er­ful lan­guage model called GPT-2000 can pro­duce an entire fan­tasy novel all on its own. It does a very com­pe­tent job, too! The plot is pretty cool, the char­ac­ters are fun, and every so often, there’s a truly beau­ti­ful sentence.

So what?

There’s no short­age of fan­tasy nov­els that meet those requirements. In fact, there are already more than (almost) any per­son can read. They’re avail­able very cheaply or even, if you have access to a pub­lic library, for free. So, the poten­tial of this tech­nol­ogy isn’t, like, “At last! Some­thing to read!”

What is it, then?

It’s odd to sit and look at this direc­tory of quest sto­ries I gen­er­ated. There’s more than a thou­sand; I’ll never read them all. When I want to read just one, how do I choose? Randomly, of course. How else?

Now, let’s say the directory wasn’t just sto­ries but full-blown GPT-2000 fantasy nov­els, a thou­sand of them, each totally new, never before read by anyone! As I consider that possibility, I ask myself: is the feel­ing one of great bounty — like a well-stocked fan­tasy aisle at a library — or is it … some­thing else? I think maybe the direc­tory feels overwhelming, or numbing, or even horrifying.

Let’s say I want to read one of the GPT-2000 nov­els. Do I just choose a file randomly, as before? I’d be the only one to read that novel, ever. If it was great, there would be no one who I could talk about it with. If it was great, the novel just below it might be even better, but I’d never know.

Reading the tor­rent of text gen­er­ated by a language model, real­iz­ing how much of it is, in fact, great — not whole nov­els worth, of course, or even whole sto­ries, but sen­tences and paragraphs, definitely; they’re cool and knotty and delightful — and then see­ing that text disappear, scrolled away into oblivion, replaced by more text that’s mar­bled just as richly with greatness, you realize: there’s no short­age of great language. But great lan­guage isn’t what makes a story great. It isn’t what makes a story at all.

In the snip­pet below, the AI-gen­er­ated text is quite good — 

I’m glad this Quest Against the Dark Lord is only a simulation. It’s cold, and it smells bad. @robinsloanpic.twitter.com/fo9Jf4AX6L

— Mr. Velocipede (@mrvelocipede) June 10, 2019

—but it’s clear that the best thing on the page, the thing that makes it glow, is the part sup­plied by a per­son.

For as capa­ble as GPT-2 and its off­shoots become, the thing that will make their out­put wor­thy of our atten­tion is UNHELPFUL PUMPKINS.

June 2019