Robin Sloan
the lab
April 2024

At home in high-dimensional space

Twenty-sided die with faces inscribed with Greek letters, 2nd century B.C.E.–4th century C.E.
Twenty-sided die with faces inscribed with Greek letters, 2nd century B.C.E.–4th century C.E.

Hello from the lab!

In this edition of my technical newsletter, I want to share a few thoughts and links that have stacked up in my notes recently.

This is an archived edition of Robin’s lab newsletter. You can sign up to receive future editions using the form at the bottom of the page.

First, though, I want to be sure you know about Moonbound, my new novel coming in June.

An advance copy of Moonbound, looking very nice on my shelf with its vivid cover.
Moonbound advance copy, aspirational shelving

This novel is more germane to the interests of this newsletter than my previous work, for a few reasons:

I’ll now encourage you to preorder Moonbound, which you can do anywhere books are sold, in any format you like, print or digital or audio. Barnes & Noble is a great option; Amazon is, of course, very convenient.

I know you understand very well the power of the algorithm: the way attention compounds. What you might not under­stand is the rela­tively modest scale of book publishing success. It only requires sales in the single-digit thousands to pop a book onto the best­seller lists, which can become gateways to further success. The point of the preorder, then, is to focus a diffuse field of interest into the hot week of a book’s release.

That’s all to say, in this domain of culture, your preorder has real consequence. Feel the power!

After you preorder, forward your confir­ma­tion email to preorder@robinsloan.com and, just before the book’s release, I’ll mail you a copy of a limited-edition zine full of world­building clues. Yes, in the real physical mail!

Which brings us to our first technical report … 

Delivering the mail in Val Town

I’ve previ­ously expressed enthusiasm about Val Town. Now, I’ve actually used it for something, and I can report that my enthu­siasm has only grown.

Val Town offers a light­weight web editor for Type­Script functions that can run in a variety of ways. For me, the killer app is the built-in email handler, which works like this:

  1. You create a “val” and designate it an email handler. That val is auto­mat­i­cally connected to an email address that looks something like this: yourUserName.coolEmailHandler@valtown.email.

  2. You write a function that accepts an Email object. This function will be called auto­mat­i­cally when new emails arrive, and it can do anything! Maybe you want to save the message to a database. Maybe you want to parse the body and execute some command. Maybe you want to send a reply — Val Town will happily do that.

  3. You start sending emails to your val!

I think a partic­u­larly powerful option here is to pass the body over to a language model, along with a prompt explaining what kind of infor­ma­tion you’d like to extract. In this way, an email handler val can become a bridge between the chaotic, unstruc­tured world of email and whatever more formal, schematic require­ments you might have.

Indeed, this is part of what’s happening behind the scenes with my preorder registration, described above. I wanted to avoid even the minimal friction of “please fill out this Google Form”, and since an online order is almost always repre­sented by a confir­ma­tion email, I wondered, “What if people could just forward those emails along? What if I … didn’t have to read them all myself?”

Getting that flow up and running as a val was easy and fun.

This is not a huge deal; honestly, a Google Form would have been fine. But, I am an incor­ri­gible sender of emails-to-self; it’s how I log and manage most of my notes, including many of the items that you’ll find in this newsletter. So, this expe­ri­ence has got me thinking: how might I enrich that flow, adding some structure along the way? What new flavors of emails-to-self might I conceive, with what kinds of useful “side effects”?

This is all to say, Val Town has fired up my imagination — always a good sign. The company and userbase alike are small enough to feel convivial and responsive; it’s a cool platform at a cool time.


Robin Rendle feels power­fully the romance, the energy, the POTENTIAL of modern CSS, and he expresses all that in his new blog and newsletter, The Cascade. I was a devoted reader of Robin’s contri­bu­tions to CSS Tricks, so I am all in for this new stream of writing and discovery.


Meta proclaims that its Llama 3 language model was “pretrained on 15T tokens from publicly available sources”.

Fifteen trillion tokens! Call that eleven trillion English words. If, just for fun, we say that 100,000 words equals a book, that’s equiv­a­lent to a hundred million books — only slightly less than the total number (estimated by Google Books) to have been published, ever, since the invention of the printing press.

I have to confess, it remains strange to me that the AI folks worried, for so many years, about the compo­si­tion of their training corpora — not only its legal status but also its structure, its origins and omissions … 

 … and then, starting sometime around 2021 or 2022, they simply: didn’t.

I think that’s because beggars can’t be choosers, and every engineer of a large model is presently, where data is concerned, a desperate beggar indeed. The require­ment for SO MUCH DATA probably reveals something deeply weird, and far from ideal, about the current approach to modeling and training these systems. Yet the connec­tion is clear: they do powerful things when you dump more data into them, so, ideal or not: more data must be had!

A recent edition of Jack Clark’s indis­pens­able AI newsletter discussed the produc­tion, to this end, of synthetic data — i.e., data not gathered from the “real world”, but produced expressly for AI training. Sometimes it’s produced using straight­for­ward computer code; sometimes it’s produced by another AI model 😵‍💫

It’s odd to contem­plate these vast new “books”—training corpora — that will never be read (could never be read) by any human. They can be under­stood and evaluated only statistically, through spot-checking or automated analysis.

I’ll never stop saying: it’s tragic that Borges missed all this. He would have loved it.


The digital essay Models All the Way Down by Christo Buschek and Jer Thorp takes a beautiful swing at these complexities. I like the bit that Robin Rendle picked out:

Here we find an important truth about [this dataset]:

It contains less about how humans see the world than it does about how search engines see the world. It is a dataset that is power­fully shaped by commercial logics.

A different activity altogether

When you raise questions about AI training data — anything related to copyright, fair use, attribution, etc.—you’ll often encounter a defense that goes something like this:

What’s the big deal? Robin Sloan “trained” himself on a ton of copy­righted books, didn’t he? He learned from them, then went on to write books of his own, and nobody calls that copyright infringement!

This might be a reason­able argument if AI models operated at the speed and fidelity of human writers and artists. It’s true, Robin Sloan did read a ton of copyrighted books. However, he did not read all the copy­righted books, and even then, the task took him decades. Furthermore, he generates output at the rate of approximately one book every four years, which works out to approx­i­mately one token per hour 😉

When capa­bility increases so substantially, the activity under discus­sion is not “the same thing, only faster”. It is a different activity altogether. We can look to the physics of phase change for clues here.

Basically, I want to immunize you against this analogy, and this objection. There’s plenty to debate in this nascent field, but any compar­ison between AI training and human education is just laughably wrong.

This is what digi­ti­za­tion does, again and again: by removing friction, by collapsing time and space, it under­mines our intu­itions about produc­tion and exchange.

No human ever metab­o­lized infor­ma­tion as completely as these language models. “As our case is new, so we must think anew, and act anew.” You’re gonna need fresh intu­itions.

Black box science

What’s the value of these ever-growing AI models, really? I know several people working in this domain who believe the goal is one (1) thing of over­riding consequence, which we might call virtual Wolfgang Pauli, or maybe on-demand Albert Einstein: an AI model that can actually produce path­breaking scientific theory.

In this formulation, economic and social trans­for­ma­tion would be a second-order effect: not of “AI itself”, but of tech­nology derived from science it produces.

I think this vision is weirdly more plausible than AI as “general labor replacement”. I suppose you could counter by saying, if they can engineer a virtual Pauli, they can FOR SURE engineer a virtual employee; one task is strictly “easier” than the other. But I don’t know if that’s true. Was Albert Einstein a good employee?

I think about this scenario a lot. For me, it’s more imag­i­na­tively compelling, with fewer readymade sci-fi precedents, than “AI leisureworld”.

Reading histories of physics in the early 20th century, it’s thrilling to learn about the intel­lec­tual and social ferment, the rich network of “whys” behind every step forward. In an imaginary AI-powered annus mirabilis, those “whys” might be absent. No story; no development; just theory. But also, perhaps, testable predictions, and new expla­na­tions for weird phenomena, as conse­quen­tial as Einstein’s for the preces­sion of Mercury.

But science is a social process; the AI folks under­stand this very well. How would AI-generated “raw theory” be channeled into the real world of science and tech­nology? How would you know when your virtual Pauli had a theory worth testing? What if it spat out a million theories, and you had good reason to believe one of them was correct — a real paradigm-buster — but you didn’t know which?

I come down on the side of skepticism, but/and … it’s chewy stuff! Fun to think about.


My memory of the terrific TV series Halt and Catch Fire has grown a bit murky, which means it’s almost time for a rewatch; that would make my third viewing. If you’re a technical or technical-adjacent person, you really ought to make time for this one. Don’t let the melo­dra­matic opening episodes throw you — the series quickly matures into one of the best-ever tele­vi­sual produc­tions about collab­o­ra­tion and creation.

Faience polyhedron inscribed with Greek letters, 2nd-3rd century C.E.
Faience polyhedron inscribed with Greek letters, 2nd-3rd century C.E.

I encourage you to take just a moment to contem­plate this twenty-sided die, and the one depicted at the top of this edition, too. These were warmed by palms nearly two thousand years ago. You could play D&D with them today, if the Met would let you.

Durable delights. Computer programs can’t do this — not yet.

From the lab in Berkeley,

Robin

P.S. This lab newsletter is very occasional, so I don’t know when I’ll send the next one. It probably won’t be until after June, so please do preorder Moonbound, and forward your confir­ma­tion email along to preorder@robinsloan.com, where my val will obedi­ently process it. Thanks for your support!

April 2024, Berkeley