February thoughts

Transmitted 20260208 · · · 370 days before impact

The Fairy Woods, 1903, Henry Meynell Rheam

I wonder, were there previous seasons in the history of computing that felt like this one? Reading Fire in the Valley, it seems clear that the birth of the personal computer in the late 1970s and early 1980s offered a comparable rush. Did the web boom of the late 1990s feel so urgent? I don’t know … if you were there, I’d love to hear about it.

My own “core tech industry era” of the late 2000s felt exciting, but less urgent and more … fizzy. It was very Millennial, I suppose: affable, easygoing. Too comfortable, probably!

I was pleased to see this big, meaty Nature comment endorsing my claim that AGI is here 😋

Several smart friends responded to my first edition saying some version of this: “It’s not the G for general that bothers me … it’s the I for intelligence.” Which is totally reasonable, but/and, at this point: that’s simply the term.

I sincerely think it’s helpful to understand the word “intelligence” in AI or AGI as an abstract token that means “doing what these things do”.

A thing to notice about Claude’s constitution is that it is VERY long — and not just long, but verbose. Floppy.

But of course this is a document for a strange new audience. It’s not primarily describing Claude’s aspirational character to us; it is describing it to Claude.

The way language models “read” is very different from the way humans read. Basically, they read everything at once — every token in the context window, slammed into place in parallel. Imagine a lens, focusing that whole field of view down into one decision: which token shall I produce next?

A very fluent human reader can “get” a whole sentence in one glance, maybe even a short paragraph, but certainly not an entire 20,000+ word document. That is exactly — and I mean literally, mechanistically, precisely — what language models do.

This new kind of reading suggests new forms of writing, new standards for style and structure. The constitution’s primary author is on the record saying that most people’s prompts are not long enough: you nearly always benefit by giving a model more to work with.

I’ve always believed brevity was the soul of basically everything, so this makes me feel pretty itchy!

The Gemini API docs discussing the model’s thinking process are fully sci-fi. If you were reading these five years ago, you’d go: wuuuuuuut

Hannu Rajaniemi is both a world-class, bleeding-edge sci-fi writer and a world-class, bleeding-edge entrepreneur. His latest novel Darkome and his new venture Red Queen Bio are two sides of the same intellectual coin; it’s fascinating to consider them together.

I’ve been rereading William Gibson, also reading a few for the first time; so far, I’ve ticked through the sequence: Count Zero, Mona Lisa Overdrive, Virtual Light, Idoru, All Tomorrow’s Parties. (No need to reread Neuromancer, which is already engraved on the inside of my skull.)

They are such weird books … the writing is, more than anything, hypnotic, in a good way. I will never, ever be able to recount the plot of a Gibson novel, but the vibes are immaculate and indelible. AND, now is the time to return to these, because his prescience — about AI, the digital occult, the grain of 21st-century life, everything — is so profound.

Here’s a 21st-century experience: receiving an email (unsolicited? did you subscribe to this? who can recall) with a gnomic link to a cursed PDF with the instructions:

Drop it into an LLM and say “hat on”.

Here’s Jason Willems on the deep puzzle of copyright in the age of AI.

Here is Steve Krouse calling for powerful tools, not blathering agents.

Here is Dave Friedman theorizing about Apple’s on-device inference strategy.

Man … on-device inference makes SO much sense … and it so clearly MUST be where this all ends up … yet I understand that, presently, nothing compares to the leviathan models running on superbeefy chips in distant data centers. How long does this remote regency last? Five years? Ten??

I’m with Nathan Witkin: the METR graph is misleading.

It’s instructive to actually poke at the tasks in some of these model evaluations. OpenAI’s GDPval is a good example. Go ahead — read a couple of random tasks. You’ll notice, they are “realistic” and also … not. The tasks are weirdly hermetically sealed: each is served up alongside everything you need to complete it, like a problem on a standardized test. None requires any interactions with other workers or organizations. They are fully defined in a way that is basically alien. All of this makes sense — it’s what makes an efficient, repeatable eval — but it also provides some reasons to discount the more breathless reports here.

I think any/every AI eval is worth actually inspecting, if/when you have the time. The ARC-AGI minigames are fun! This one took me a few minutes to solve …

P.S. “Make up your own eval then” is a fine retort, and let me tell you, I have ideas for some GOOD ones …

I don’t want critical engagement to crowd out plain recognition of the genuinely liberatory potential of these tools. Because I do recognize it!

One of my current favorite internet hangouts is a cozy space crafted by the writer Craig Mod with the help of Claude Code. Craig recently wrote a bit about the project, and I can add, from my perspective as a user, that it’s wonderful to use software so perfectly contextual, so opinionated.

I’ve always liked the idea that everyone has inside of them one (1) book. Maybe everyone also has one (1) piece of software, and now, with AI coding agents, we will get them out … and be done.

Arcee’s Trinity TrueBase model is cool — the unmasked shoggoth:

If you’re a researcher who wants to study what high-quality pretraining produces at this scale — before any RLHF, before any chat formatting — this is one of the few checkpoints where you can do that. We think there’s value in having a real baseline to probe, ablate, or just observe. What did the model learn from the data alone? TrueBase is where you answer that question.

Most of my aggregate language model time, 2016-2026, has been spent with base models. We can think of these as pure capability without usefulness: wildfire rather than hearth, lightning rather than battery. But, the capability IS all there within them; I want to argue that ALL of it comes from the insanely demanding next-token prediction task.

Everything that follows — finetuning, reinforcement learning, etc. — is essentially “EQing” that blinding potential into something people can actually use and/or enjoy.

It’s like hearing a blast of static, yet knowing there’s a Beatles song hidden inside, if only you can carve it out. Maybe the AI companies ought to start stocking this book in their libraries …

Spencer Chang:

im betting on a future aesthetic rooted in proof of longterm existence - a bronze statue worn away where people have touched it, the patina of old plastic doors, the grooves left in wood from consistent use

This is so canny, so contemporary, so obviously and totally correct. Gibson-level insight.

The great personal superpower of the decade ahead will be: remembering what you wanted to do in the first place.

New capabilities emerge, new manias unfurl, and the drum beats loud: you should be trying this … you better not miss that … and: sure, maybe! But maybe you should also remember what you wanted to do in the first place.

What’s actually interesting to you? What has been interesting for the past ten years? What will still be interesting ten years from now? How do new capabilities support that interest, or not?

Sharpen that keel! You’re gonna need it.

I’ll add one more thing. I think we are in a period so interesting that basically everybody ought to be writing about it: reporting, reflecting, resisting, re-everything — in a blog, a newsletter, an email to friends, whatever.

I said this in my first edition: AI language models are a particularly, maybe even uniquely, human technology. They are paradoxical and poetic, like something out of myth: the uncanny double, the magic mirror. Psychological intuitions have proven as useful as technical insights.

You have a stake in this, simply by virtue of being a talking animal. Might as well jot a few notes for posterity, and for the rest of us, here and now.

From the lab,

Robin