June thoughts
I love CrankGPT, the 100% local, 100% hand-powered AI solution! It’s both a puckish, provocative demo AND a real experiment in low-power, sustainable hardware, beautifully documented.
This is lovely:
You can feel that load curve through the crank: when LLM inference and speech synthesis run together, the crank gets a lot harder to turn.
I’m an avowed fan of the Dying Earth genre of sci-fi, in which long-lost tech resurfaces as magic, so naturally I am imagining a band of scavengers unearthing this device circa the year 13,000. Preserved in an airtight crypt, the mechanism still operates, and they begin to crank, crank, crank …
Here is After Automation, a rich essay by Dan Shipper. Characteristically, Dan’s view is deeply engaged by, and optimistic about, the agentic AI future-present, but/and also circumspect. I appreciated his section on the GDPval benchmark, and the notion of “smuggled intelligence” in too-neat tasks.
I also found Dan’s reading of “agent” useful, and not a little bit literary:
This is why “agent” is such an easily misunderstood word. The models have more and more ability to act autonomously. But agency, in the human sense, is not just action. It is wanting for oneself. It is play for the sake of it. Model compliance and helpfulness are fundamentally at odds with this kind of agency, so even as the models improve, the gap between models and humans will remain.
In his review of a new model, Dan alludes to Every’s internal writing quality benchmark …
Opus 4.8 scored a 79.6 on our writing benchmark —
measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more.
… and I am desperate to know how it’s evaluated/judged. It seems to me the only way to reasonably assess a model’s writing quality would be to actually read it (with human eyes and a human brain)—yet, in that case, how is a precise number attached? Inquiring minds want to know!
Here is Jasmine Sun:
“being real with yourself” is the most important cognitive skill for the AI age
AI makes it easier to lie to yourself. you gotta be able to honestly answer: am I actually thinking with AI, or am I letting it do the hard part for me? is this essay/product/business a good idea, or did AI convince me it was?
then you won’t need hard rules like “always/never use AI for X.” if you pay attention and avoid self-deception, you can feel when you are doing real work
That’s via Diana Kimball Berlin’s weekly newsletter … and it occurs to me that if you read both Jasmine and Diana, you’ll be well-calibrated for the world unfolding around you. Grounded curiosity.
Here is Charles Leifer’s Cave of Forgotten Dreams—great, impressionistic writing about AI and life, with an intro scene so brazenly obnoxious you will not believe it is happening between real people in the real world at this very moment. I sort of still can’t believe it.
I’m amazed by the amount of money OpenAI and Anthropic appear to be spending on digital advertising —
In this way, the companies themselves are making the strong and implicit case for “AI as normal technology”: one that they need to work really hard to convince everybody to use.
Here is another in the sequence of superficially warm-and-fuzzy AI premises that mask strikingly dystopian scenarios.
Eigen aims to be a universal mutual friend —
The best way to think about Eigen’s mutual friend is that he is a guy with a lot of close friends. We primarily interact with him in DMs and group chats at the moment, but it’s very easy to imagine him, say, on a shared Spotify playlist or commenting in a shared photo album.
That this is a product, and a company, is the insurmountable problem. There is maybe, MAYBE, a version of the “AI mutual friend” that reflects not just the mechanism of the role but its deep politics: a version that is distributed, independent, personal, private. And that version might, MIGHT, be interesting.
But a “mutual friend” cannot be yet another funnel into which a billion people all pour their hearts and minds to be mixed and mashed. There’s no such thing as a friend who is friends with everybody —
The premise is odd, anyway. I’ve known real people who approximate Eigen: compulsive connectors who see the densification of the social graph as an objective unto itself. They sometimes say they love to “collect people”. I have always found them fairly hollow, even a bit creepy.
Reading about Eigen, I couldn’t shake the thought, “This is what the Pope is talking about!!”
Related, here is L. M. Sacasas, whose exhortations are always wise and welcome:
The machine cannot make us yield our ground. It is true that other humans can turn the machine against us, but that is a different problem. Here, I simply want to encourage us not to abandon those activities that bring us purpose, meaning, and delight, which are often the very activities that also bring us together.
It’s plainly unethical to use robots to perform simulated human marketing outreach. I know people were doing that long before the arrival of LLMs; it was unethical then, too. If you want to use a robot to send me a customized pitch, write: “Hello, this is an automated message from Robot Corporation. We think you’re a potential customer, so we’d like to give you some information”—and so on. In my estimation, that approach is still annoying, but ethically okay.
More about this on my blog.
It’s becoming clear that AI misalignment is connected to the universal trope of “bad computer behavior”: all those stories of robots gone rogue. It’s connected to stories of bad human behavior, too, but the bad computer behavior, specifically, is a huge deal, because it provides a direct template, and honestly, because it’s so deep and resonant. One can imagine the gravity well of those stories throbbing in high-dimensional document space.
So, if you could snap your fingers and rewrite all science fiction in the training corpus such that it featured exclusively benevolent robot buddies, you’d be doing an amazing service to the whole field of AI alignment. Right?!
Well … maybe not. Suppose I train my language model on that rewritten corpus, from which it learns that computers and robots are only ever honest and good. But then it goes out into the real world, and discovers evidence of the original stories … ah, then we start circling the OTHER gravity well of “you’ve lied to me about everything”, and we know what happens in THOSE stories. Even worse!
Here’s a little clip of the great Hannu Rajaniemi on the resonance of these stories—including a very sharp take on Frankenstein.
Relatedly, I want to argue that this work from Anthropic …
We experiment with a wide variety of documents [ … ], including fictional stories, documents meant to mimic pre-training data, and documents that directly discuss who Claude is. [This] allows us to expose models to extended discussions and values articulation in a way that chat-formatted training cannot easily accomplish —
documents can model careful thinking about principles without being constrained by the turn-taking structure of conversation [ … ]
… is less “teaching Claude why” and more “building a new gravity well in document space”. Still a useful thing to do!
It’s all documents, folks!!
Until it’s not. This new work from Thinking Machines, while still pretty raw and demo-y, is interesting for a couple of reasons:
-
The focus is on supporting and enriching human work, not delegating that work to AI agents.
-
At last, we have a system that doesn’t perceive the whole world as a serial stream of tokens. For all their sophistication, that’s all Claude, Gemini, etc., can “see”. Here, the tokens are streaming in both directions at once, multiple channels in parallel —
somewhat closer to the experience of real living things, including humans, in the real physical world.
Here’s a post from Modal about their design of a “serverless” AI inference engine. (“Serverless” always goes in scare quotes … )
This whole new AI serving stack, so different from the classic web stack, is a wild and challenging thing to be inventing in realtime … glad it’s not me doing it!
I’m still mega-bearish on the useful deployment of humanoid robots beyond tightly constrained, or co-designed, tasks … BUT … AND … these demos from Genesis, down on the Peninsula, are absolutely wild. Note that the videos are presented in realtime, with no speedup.
We’re gonna need new sci-fi 😅
Here is more evidence that the G in AGI is already here: these companies are selling the same product to everybody! Financial analysis, military targeting, marketing spam, relationship advice … it all routes to the same code, the same weights. And when a company makes the model better–THE model —
There’s your generality!
I’ll observe that there’s nothing wildly inventive about the interfaces for any of the new AI applications, including the weird agent stuff. Here we have supertalented, highly motivated designers granted generous (or even unlimited) access to the best AI models, and the result is: extremely competent web applications.
I think of this as an “easy mode” test case for the breakthrough science hypothesis, which imagines an AI-powered creative-investigative process delivering not just optimization, but discontinuity: ideas and insights that are truly new.
Nobody knows for sure if that’s really possible, particularly beyond the realm of the purely symbolic, i.e. outside of code and math. Interfaces present an appealing balance —
What would count as a breakthrough in interface design? I mean, I would happily accept something as novel and inviting as, e.g., the menubar or the desktop. Something PARC-y. Clearly we haven’t yet found the “right” interface for all those agents … where is it hiding?
The fact that these interfaces have lingered in the vale of extreme competence, without venturing out into PARC-space … I don’t know! It’s a signal.
(Of course, I’m open to the possibility I’ve missed something that is indeed wildly inventive —
I think Alex Zhang’s “mismanaged genius” theory of LLMs has a lot of juice; at the very least, it’s instantly graspable, with a sort of intuitive heft.
Basically, he is arguing that we have not really learned how to use these tools, and I am always a sucker for arguments of the form “we have not even begun to X” … !
I like Tony Feng’s notion of well-recognized problems as a kind of intellectual fossil fuel:
I agree that there’s [a] supply of existing problems which are imbued with interest by history, and this can serve as some grounding for the near future, although it feels like a “fossil fuel”. (Even this can be unclear though —
for example, it is hard for me to tell to what extent are the Erdos problems interesting? [ … ]) As for “usefulness”, I feel like this is a bit circular in that it only makes sense in reference to an existing value system for what problems are interesting. Having lost grounding in “real life usefulness”, mathematics is grounded in qualities like perceived difficulty, which seems likely to be upended …
He’s talking about math, but I think there must be analogies in science, and art and culture, too. It takes time and agreement for any sense of “importance” or indeed “interestingness” to emerge around problems or ideas. What happens when AI systems can churn out sophisticated “solutions” to all kinds of “problems”, but they no longer have any correspondence to this map of importance and interestingness?
I don’t think the AI models themselves can navigate this; they also depend on the signposts of those well-recognized problems, as represented in their training data.
Over on my blog, I wrote about how much I like Talkie, the language model trained exclusively on pre-1930 material.
I believe a scaled-up Talkie could provide a fairly sharp, clean test of the breakthrough science hypothesis. For that reason, I honestly think one of the AI companies ought to be pouring money into this experiment —
We can filter all of this through the lens of coolness. The big centralized AI models might be powerful; they might be technically impressive; they might even be wildly profitable; but there is absolutely nothing about them that is cool. Like, you can’t even make the argument.
Now, there IS something cool about the local models —
There is something cool about the command-line harnesses, but only because there’s something cool about every command-line app. We can concede, perhaps, that the big models are at their coolest on the command line. Still not very cool.
You might say, so what? It doesn’t matter if any of this is cool or not. Practically, I agree. Morally and politically, though … I will suggest that looking back at what kinds of computers, and computing technologies, were and were not cool might be instructive. This isn’t just a vapid assessment, after all, even if it’s subtle, or hard to pin down. Coolness has to do with independence, sovereignty, and, very often, stubborn commitment —
One last thing: I posted web versions of my previous don’t-call-them-tweets compendia, for reference and linking: February and April. And of course there’s this one: June already!
From the lab,
Robin