Notes from the quest factory

Recently, I used an AI trained on fantasy novels to generate custom stories for about a thousand readers. The stories were appealingly strange, they came with maps (MAPS!), and they looked like this:

OMG the story I just received in the mail, generated by @robinsloan’s AI program based on my interests, is SIMPLY DELIGHTFUL. Look at this banger of a first page. pic.twitter.com/3qXzrde2Oa
— Dan Cohen (@dancohen) June 8, 2019

Here, I want to share some notes that might be useful to other people doing similar projects, and/or people who imagine they might.

Okay — first I’ll do philosophy, then technology. Feel free to skip ahead if you like.

I see what you did there

Honestly, I think the key to this project wasn’t the AI but the paper.

I’m very happy to have discovered Lob, a service that allows you to print and mail things using code. How are these things printed? From where are they mailed? I have no idea, which is mildly disconcerting, but also mildly magical. I mean, this function —

response = lob.letters.create({
  description: "Letter for #{purchase["email"]}",
  to: {
    name: purchase["ship_to_name"]
    # etc
  },
  from: {
    name: "Year of the Meteor"
    # etc
  },
  file: pdf_name,
  double_sided: true,
  mail_type: "usps_first_class"
  },
  {"Idempotency-Key" => purchase["email"]}
)

—sends a letter in the mail! For about a dollar! That’s wild!

Why did I want to print and mail these stories? After all, I could have built a quest generator on the web, accessible for free. A series of prompts; a map; a squirt of AI.

I could have, sure. And … then what?

People might have found their way to the page and laughed for a moment at what emerged. Snapped screenshots, posted them. And then: on to the next bauble! There’s no shortage. Perhaps you’ve crafted some of these baubles yourself. You might know this feeling.

Another day, another “I see what you did there.”

By contrast: because these stories were delivered physically, I have photographs of letters in people’s front yards. In their houses. WITH THEIR DOGS.

I'm curious so I open the mysterious envelope outside. Inside is a MAP! It's @robinsloan's 'Year of the Meteor' AI (neural network?) adventure. Curious dog not included. pic.twitter.com/ozdrG6hvMW
— Jonathan Fly (@jonathanfly) June 8, 2019

I was attracted to AI language models in the first place because they showed me sentences that had a strange and ineffable flavor. It’s like English as a second language, except the first isn’t Spanish or Swedish but rather, I don’t know, Martian. For someone who enjoys words, who likes it when sentences are weird and/or beautiful and/or unexpected, that’s obviously appealing.

But, if that’s the appeal, then the challenge is to get people to actually READ THE SENTENCES. Not just appreciate the framing; not just nod at the technology.

Upon encountering these quests, did readers’ souls quiver? Did their eyes film with tears, blurring the text? Er, no. But some of them really did spend some time with their printouts. For me, that’s crucial; non-negotiable. “I see what you did there” is weak gruel. I am in this to have people read things.

My AI-generated quest from @robinsloan is full of strange and wondrous things. My favorite is: "Fenris was a severed king, a dwarf, not a dwarf..." I'm also fond of the hornless unicorn that was "a man with a horn painted on his chest..." Very good stuff indeed. pic.twitter.com/DoUn5FlZ3E
— Paul F. Olson (@pfolson) June 10, 2019

Okay, enough aesthetic hand-wringing. Now for the nerdy stuff!

The skeleton

Here, I’ll outline the process I used to generate these quests.

Update: I gave a talk about these techniques at the Roguelike Celebration in October 2019. The recording offers a nice way to get this information, with a special emphasis on the quality of the AI-generated language.

My invitation to participate enticed about a thousand people to pay a few dollars and fill out a Google Form, specifying things like the name of their quest’s leader, the kind of artifact their questers sought, the species of creature encountered on the road — you know, quest essentials!

Even more essential to a quest, perhaps, is a map.

AHHH I love it

Using Ryan Guy’s terrific Fantasy Map Generator code, I churned out a few thousand maps, each different, but/and also very similar to the one above. (And, let’s be real … these maps are the stars of the show. You can stop reading now.)

The place names all came from a tiny neural network trained on a selection of real place names from world history. Reviewing the input file now, I see that I used lists of towns in England, Italy, France, Denmark, Japan, and ancient Rome. Neural networks can work as blenders, mixing up structures and phonemes in an appealing way. They are really, really good at names!

Next, downloaded the quest design form responses. Using a Ruby script, each reader was assigned a map, and the place names on that map were combined with their responses to produce a “story skeleton” that I could feed into the AI text generator.

I need to pause here for a bit of background. The text generator I used was GPT-2, a powerful language model developed by San Francisco’s OpenAI. GPT-2 was initially trained on many gigabytes of text from the web. I continued that training — “fine-tuning” the model — on several hundred megabytes of fantasy novels. My personal GPT-2 now very strongly believes that most sentences ought to be about shadowy keeps and road-weary rangers. (I do not disagree.)

GPT-2’s code gives you the option to provide “context.” Before you ask the model to generate text, you can feed in a sequence of characters to establish, basically, what’s going on in the story. If you do so, GPT-2 will dutifully refer back to the names, the places, and, to a degree, the situations included in that context. It doesn’t stay perfectly consistent — any human writer could do better — but this is a capability that has, until now, eluded AI language models entirely.

This notion of context was key to the quest generation process. I would alternate between getting text out of GPT-2 and feeding prompts in from the story skeleton — in effect, guiding GPT-2 along a particular path.

The Ruby code to produce one story skeleton from a single reader’s map and form looked like this:

  prompt "#{format_for_start(survey[:group])} began \
          their quest in #{city1}, a city known for", 3

  prompt "This quest to defeat the Dark Lord \
          was led by #{survey[:leader]}, who", 2

  prompt "The questers sought #{survey[:seek]}, which", 3

  prompt "They intended to travel #{survey[:travel]}, \
          but, unfortunately,", 1

  prompt "Then, on the road toward #{city2}, \
          they encountered #{survey[:encounter]}. It", 3

  prompt "The questers crossed into the \
          country called #{country1}, known for", 2

  prompt "There, in #{country1}, the Dark Lord found them. He", 2

  prompt "The Dark Lord cruelly", 2

  prompt "Did their quest fail because the questers \
          desired only #{survey[:desire]}? Or was it", 2

  prompt "#{survey[:leader]}'s last thoughts were", 1

  prompt "The world was quiet.", 1, ""

If there’s any part of my process that’s even a little bit novel or interesting, this is it, so I want to pause and point out a few things.

First: I can specify how many sentences I want with the number that follows the prompt text. This is a crucial artistic control! GPT-2 generates a sequence of fixed length; you can’t ask it for “just two sentences, please.” But you can take the fixed-length sequence, break it into sentences yourself (simply splitting it on periods works great), and then only use as many as you want.

Second: notice the words I use at the ends of the prompts. I am hardly an AI whisperer, but I do think I’ve learned a bit about nudging a language model towards interestingness. These systems are, in general, very content to just … hang out. They love to describe a scene, then re-describe it, and describe it again, rather than advance the plot with a twist or a turn. (In their defense: they don’t know what a plot is, or a twist, or a turn.) Notice, in the fourth prompt above, the “but, unfortunately,” which produced reliably fun results. You can see that almost all of my prompts “set up” GPT-2 in this way. (And, by contrast, a different version of this template without those guiding words produced stories with palpably less “going on.”)

Third: look closely at the final prompt. Notice the empty string at the end:

  prompt "The world was quiet.", 1, ""

As I was fiddling with these prompts, my friend Dan proposed an idea: what if the text that GPT-2 received and the text the reader read were sometimes different? In the case above, what’s happening is that GPT-2 is seeing the line “the world was quiet,” which will influence the text it generates; however, “the world was quiet” is not being shown to the reader. The reader is instead seeing … nothing. An empty string. So the reader sees only GPT-2’s response to “the world was quiet,” which in practice goes something like

No fires burned, and no lamps were lit.

Every so often, a breeze would rustle the trees and make them shimmer.

For a few moments, he thought he heard the distant sound of an ancient love song.

I think that’s really lovely! There’s no need to preface those lines with “the world was quiet”; they communicate that on their own. This technique of showing text to GPT-2 that you conceal from the reader is a sneaky way of telling the system what you want. It’s the hidden agenda, the moon behind the clouds. I think it’s potentially very powerful, but/and I’ve only scratched the surface here.

The output of the code above was a text file that looked like this:

A pair of thieves began their quest in Easy, a city known for|3
This quest to defeat the Dark Lord was led by Fenris Tusk, who|2
The questers sought a lost grimoire, which|3
They intended to travel quickly, but, unfortunately,|1
Then, on the road toward Lod Herley, they encountered an elk. It|3
The questers crossed into the country called Hagerobonou, known for|2
There, in Hagerobonou, the Dark Lord found them. He|2
The Dark Lord cruelly|2
Did their quest fail because the questers desired only peace? Or was it|2
Fenris Tusk's last thoughts were|1
The world was quiet.|1|

After I’d generated one of those files for each reader, how did I use it?

A Python script fed the file’s first prompt into GPT-2 as context, then asked it for a blast of text. Next, it filtered that text heavily: truncating to a desired number of sentences, as discussed above; rejecting if wonky (for example, if it included the strings “www” or “http”); and, importantly, checking for words I would never use in my own writing. (For this, I relied on Darius Kazemi’s wordfilter, bulked up with additional words and phrases of my choosing. If you’re using a language model to generate text that will be shown to humans other than you, you must include a step like this. For me, it was crucial to generate a bunch of stories, scout them for scenes or even just ~implications~ I found skeezy or upsetting, and then add filters to reject that kind of content. The stock wordfilter wouldn’t have caught it all, and I wouldn’t have imagined it all, just sitting and speculating. I had to survey the output.)

When the Python script had text in hand that passed all those tests, it fed (1) the original prompt, (2) the text generated in response, and (3) the next prompt back into GPT-2, all concatenated. In this way, the context grew and grew, always a mixture of reader-provided prompts and GTP-2’s own “imagination,” so both could influence the story as it unfolded.

The finished quest was deposited into a plain text file, which another Ruby script transformed into a PDF, which yet another Ruby script sent to Lob for printing and mailing.

You can see an example of a finished quest PDF here.

It finally came! An AI-generated fictional quest and autogenerated map from @robinsloan! Look at those Escher-esque fjords. LOOK AT THEM 😍

Deets: https://t.co/DfNbteBfee I just filled out a form and the AI did the rest, Mad-Lib style. Some choice snippets below, more in thread. pic.twitter.com/mY7EKUcxqU
— Becki Lee (@omgbeckilee) June 11, 2019

Unhelpful pumpkins

Let’s imagine it’s ten years from now, and the super-powerful language model called GPT-2000 can produce an entire fantasy novel all on its own. It does a very competent job, too! The plot is pretty cool, the characters are fun, and every so often, there’s a truly beautiful sentence.

So what?

There’s no shortage of fantasy novels that meet those requirements. In fact, there are already more than (almost) any person can read. They’re available very cheaply or even, if you have access to a public library, for free. So, the potential of this technology isn’t, like, “At last! Something to read!”

What is it, then?

It’s odd to sit and look at this directory of quest stories I generated. There’s more than a thousand; I’ll never read them all. When I want to read just one, how do I choose? Randomly, of course. How else?

Now, let’s say the directory wasn’t just stories but full-blown GPT-2000 fantasy novels, a thousand of them, each totally new, never before read by anyone! As I consider that possibility, I ask myself: is the feeling one of great bounty — like a well-stocked fantasy aisle at a library — or is it … something else? I think maybe the directory feels overwhelming, or numbing, or even horrifying.

Let’s say I want to read one of the GPT-2000 novels. Do I just choose a file randomly, as before? I’d be the only one to read that novel, ever. If it was great, there would be no one who I could talk about it with. If it was great, the novel just below it might be even better, but I’d never know.

Reading the torrent of text generated by a language model, realizing how much of it is, in fact, great — not whole novels worth, of course, or even whole stories, but sentences and paragraphs, definitely; they’re cool and knotty and delightful — and then seeing that text disappear, scrolled away into oblivion, replaced by more text that’s marbled just as richly with greatness, you realize: there’s no shortage of great language. But great language isn’t what makes a story great. It isn’t what makes a story at all.

In the snippet below, the AI-generated text is quite good —

I’m glad this Quest Against the Dark Lord is only a simulation. It’s cold, and it smells bad. @robinsloan pic.twitter.com/fo9Jf4AX6L
— Mr. Velocipede (@mrvelocipede) June 10, 2019

—but it’s clear that the best thing on the page, the thing that makes it glow, is the part supplied by a person.

For as capable as GPT-2 and its offshoots become, the thing that will make their output worthy of our attention is UNHELPFUL PUMPKINS.

June 2019