This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

At home in high-dimensional space

April 29, 2024
Twenty-sided die with faces inscribed with Greek letters, 2nd century B.C.E.–4th century C.E.
Twenty-sided die with faces inscribed with Greek letters, 2nd century B.C.E.–4th century C.E.

In this edi­tion of my tech­nical newsletter, I want to share a few thoughts and links that have stacked up in my notes recently.

First, though, I want to be sure you know about Moonbound, my new novel coming in June.

An advance copy of Moonbound, looking very nice on my shelf with its vivid cover.
Moonbound advance copy, aspirational shelving

This novel is more ger­mane to the inter­ests of this newsletter than my pre­vious work, for a few reasons:

I’ll now encourage you to pre­order Moonbound, which you can do any­where books are sold, in any format you like, print or dig­ital or audio. Barnes & Noble is a great option; Amazon is, of course, very convenient.

I know you under­stand very well the power of the algorithm: the way atten­tion compounds. What you might not under­stand is the rel­a­tively modest scale of book pub­lishing success. It only requires sales in the single-digit thou­sands to pop a book onto the best­seller lists, which can become gate­ways to fur­ther success. The point of the pre­order, then, is to focus a dif­fuse field of interest into the hot week of a book’s release.

Delivering the mail in Val Town

I’ve pre­viously expressed enthu­siasm about Val Town. Now, I’ve actu­ally used it for some­thing, and I can report that my enthu­siasm has only grown.

Val Town offers a light­weight web editor for Type­Script func­tions that can run in a variety of ways. For me, the killer app is the built-in email han­dler, which works like this:

  1. You create a “val” and des­ig­nate it an email han­dler. That val is auto­mat­i­cally con­nected to an email address that looks some­thing like this: yourUserName.coolEmailHandler@valtown.email.

  2. You write a func­tion that accepts an Email object. This func­tion will be called auto­mat­i­cally when new emails arrive, and it can do anything! Maybe you want to save the mes­sage to a database. Maybe you want to parse the body and exe­cute some command. Maybe you want to send a reply — Val Town will happily do that.

  3. You start sending emails to your val!

I think a par­tic­u­larly pow­erful option here is to pass the body over to a lan­guage model, along with a prompt explaining what kind of infor­ma­tion you’d like to extract. In this way, an email han­dler val can become a bridge between the chaotic, unstruc­tured world of email and what­ever more formal, schematic require­ments you might have.

Indeed, this is part of what’s hap­pening behind the scenes with my pre­order registration, described above. I wanted to avoid even the min­imal fric­tion of “please fill out this Google Form”, and since an online order is almost always rep­re­sented by a con­fir­ma­tion email, I wondered, “What if people could just for­ward those emails along? What if I … didn’t have to read them all myself?”

Get­ting that flow up and run­ning as a val was easy and fun.

This is not a huge deal; honestly, a Google Form would have been fine. But, I am an incor­ri­gible sender of emails-to-self; it’s how I log and manage most of my notes, including many of the items that you’ll find in this newsletter. So, this expe­ri­ence has got me thinking: how might I enrich that flow, adding some struc­ture along the way? What new fla­vors of emails-to-self might I conceive, with what kinds of useful “side effects”?

This is all to say, Val Town has fired up my imagination — always a good sign. The com­pany and user­base alike are small enough to feel convivial and responsive; it’s a cool plat­form at a cool time.


Robin Rendle feels pow­erfully the romance, the energy, the POTEN­TIAL of modern CSS, and he expresses all that in his new blog and newsletter, The Cascade. I was a devoted reader of Robin’s con­tri­bu­tions to CSS Tricks, so I am all in for this new stream of writing and discovery.


Meta proclaims that its Llama 3 lan­guage model was “pretrained on 15T tokens from pub­licly avail­able sources”.

Fif­teen tril­lion tokens! Call that eleven tril­lion Eng­lish words. If, just for fun, we say that 100,000 words equals a book, that’s equiv­a­lent to a hun­dred mil­lion books — only slightly less than the total number (estimated by Google Books) to have been published, ever, since the inven­tion of the printing press.

I have to confess, it remains strange to me that the AI folks worried, for so many years, about the com­po­si­tion of their training corpora — not only its legal status but also its struc­ture, its ori­gins and omissions … 

 … and then, starting some­time around 2021 or 2022, they simply: didn’t.

I think that’s because beg­gars can’t be choosers, and every engi­neer of a large model is presently, where data is concerned, a des­perate beggar indeed. The require­ment for SO MUCH DATA prob­ably reveals some­thing deeply weird, and far from ideal, about the cur­rent approach to mod­eling and training these systems. Yet the con­nec­tion is clear: they do pow­erful things when you dump more data into them, so, ideal or not: more data must be had!

A recent edi­tion of Jack Clark’s indis­pens­able AI newsletter dis­cussed the pro­duc­tion, to this end, of syn­thetic data — i.e., data not gath­ered from the “real world”, but pro­duced expressly for AI training. Some­times it’s pro­duced using straightfor­ward com­puter code; some­times it’s pro­duced by another AI model 😵‍💫

It’s odd to con­tem­plate these vast new “books”—training corpora — that will never be read (could never be read) by any human. They can be under­stood and eval­u­ated only statistically, through spot-checking or auto­mated analysis.

I’ll never stop saying: it’s tragic that Borges missed all this. He would have loved it.


The dig­ital essay Models All the Way Down by Christo Buschek and Jer Thorp takes a beau­tiful swing at these complexities. I like the bit that Robin Rendle picked out:

Here we find an impor­tant truth about [this dataset]:

It con­tains less about how humans see the world than it does about how search engines see the world. It is a dataset that is pow­erfully shaped by com­mer­cial logics.

A different activity altogether

When you raise ques­tions about AI training data — anything related to copy­right, fair use, attribution, etc. — you’ll often encounter a defense that goes some­thing like this:

What’s the big deal? Robin Sloan “trained” him­self on a ton of copy­righted books, didn’t he? He learned from them, then went on to write books of his own, and nobody calls that copy­right infringement!

This might be a rea­son­able argu­ment if AI models oper­ated at the speed and fidelity of human writers and artists. It’s true, Robin Sloan did read a ton of copy­righted books. However, he did not read all the copy­righted books, and even then, the task took him decades. Furthermore, he gen­er­ates output at the rate of approx­i­mately one book every four years, which works out to approx­i­mately one token per hour 😉

When capa­bility increases so substantially, the activity under dis­cus­sion is not “the same thing, only faster”. It is a dif­ferent activity altogether. We can look to the physics of phase change for clues here.

Basically, I want to immu­nize you against this analogy, and this objection. There’s plenty to debate in this nascent field, but any com­par­ison between AI training and human edu­ca­tion is just laugh­ably wrong.

This is what dig­i­ti­za­tion does, again and again: by removing fric­tion, by col­lapsing time and space, it under­mines our intu­itions about pro­duc­tion and exchange.

No human ever metab­o­lized infor­ma­tion as com­pletely as these lan­guage models. “As our case is new, so we must think anew, and act anew.” You’re gonna need fresh intu­itions.

Black box science

What’s the value of these ever-growing AI models, really? I know sev­eral people working in this domain who believe the goal is one (1) thing of over­riding consequence, which we might call vir­tual Wolf­gang Pauli, or maybe on-demand Albert Ein­stein: an AI model that can actu­ally pro­duce path­breaking sci­en­tific theory.

In this formulation, eco­nomic and social trans­for­ma­tion would be a second-order effect: not of “AI itself”, but of tech­nology derived from sci­ence it pro­duces.

I think this vision is weirdly more plau­sible than AI as “general labor replacement”. I sup­pose you could counter by saying, if they can engi­neer a vir­tual Pauli, they can FOR SURE engi­neer a vir­tual employee; one task is strictly “easier” than the other. But I don’t know if that’s true. Was Albert Ein­stein a good employee?

I think about this sce­nario a lot. For me, it’s more imag­i­na­tively compelling, with fewer ready­made sci-fi precedents, than “AI leisureworld”.

Reading his­to­ries of physics in the early 20th century, it’s thrilling to learn about the intel­lec­tual and social ferment, the rich net­work of “whys” behind every step for­ward. In an imag­i­nary AI-powered annus mirabilis, those “whys” might be absent. No story; no development; just theory. But also, perhaps, testable predictions, and new expla­na­tions for weird phenomena, as con­se­quen­tial as Ein­stein’s for the pre­ces­sion of Mercury.

But sci­ence is a social process; the AI folks under­stand this very well. How would AI-generated “raw theory” be chan­neled into the real world of sci­ence and tech­nology? How would you know when your vir­tual Pauli had a theory worth testing? What if it spat out a mil­lion theories, and you had good reason to believe one of them was correct — a real paradigm-buster — but you didn’t know which?

I come down on the side of skepticism, but/and … it’s chewy stuff! Fun to think about.


My memory of the ter­rific TV series Halt and Catch Fire has grown a bit murky, which means it’s almost time for a rewatch; that would make my third viewing. If you’re a tech­nical or tech­nical-adjacent person, you really ought to make time for this one. Don’t let the melo­dra­matic opening episodes throw you — the series quickly matures into one of the best-ever tele­vi­sual pro­duc­tions about col­lab­o­ra­tion and creation.

Faience polyhedron inscribed with Greek letters, 2nd-3rd century C.E.
Faience polyhedron inscribed with Greek letters, 2nd-3rd century C.E.

I encourage you to take just a moment to con­tem­plate this twenty-sided die, and the one depicted at the top of this edi­tion, too. These were warmed by palms nearly two thou­sand years ago. You could play D&D with them today, if the Met would let you.

Durable delights. Com­puter pro­grams can’t do this — not yet.

To the blog home page