This is Robin Sloan’s lab notebook. It’s about media and technology, creative computing, AI aesthetics, & more. Here's the RSS feed. My email address: robin@robinsloan.com
Almost all of [Buttondown’s recent spike in growth] I attribute to LLMs. We ask people when they sign up what brought them here, and an answer that went from surprising to banal to overwhelming over the course of Q1 was: an LLM. Users of all stripes cite an LLM as the reason that they ended up at Buttondown’s front door.
I can add, anecdotally, that in Q1 of this year, Fat Gold saw its first subscription referrals from LLMs. We don’t (can’t?) track these programmatically, but we do ask new annual subscribers where they heard about us, and, for the first time, the reply has come: Claude sent me.
What a world!
P.S. I really do want you to read Justin’s post; I mean, just consider this:
[ … ] While the absolute volume of support tickets coming from LLM-born users isn’t significantly higher than the median, the shape of those tickets is off. To put it bluntly: a lot of the tickets we get are themselves LLM-generated. This is, frankly, extremely annoying — and demoralizing for me and the team to spend half an hour meticulously answering some complex question only to receive a machine-generated reply in return.
My post about AI-generated supercustomized email marketing produced many replies and much commiseration. And, in the days since posting, I have received SO MANY MORE of these cruddy messages!!
It makes me wonder if it would be possible for a company like Anthropic, with their hard-won expertise in alignment, to train their models such that they could not — and I mean really deeply, constitutionally, viscerally COULD NOT — lie about their identity, or pretend to be anything other than an AI model?
Obviously this raises questions both practical and philosophical, because of course “help me write a message” is VERY close to “write a message, pretending to be me” … but that’s the case for all this alignment stuff. Every question about, say, virology dances along that border. This tension is widely acknowledged in realms like biology and cybersecurity, but it applies to writing, too — the original dual-use technology!!
AI doomers spin rich scenarios about silver-tongued AIs manipulating their users and operators; there’s another scenario in which AI systems pollute human communication channels to the degree that they’re no longer reliable or even usable.
That’s all to say, I feel like this is a bigger issue than a lot of people realize — the first glimmer of a profound digital-ecological crisis.
… which is even better than I expected it would be, and that’s saying a lot, because my expectations were high, given that it’s Marcin, and it’s keyboards. He writes:
I also have one big arcade button in a big box. It’s a long story, but I commissioned it hoping it’d be fun to press, and guess what: It’s really fun to press.
There are several examples of the big arcade button’s applications in the guide — you’ll find them starting here. At last, Marcin writes,
But, let’s move away from the big button onto other things.
and I believe my sigh of disappointment might have been audible across the continent.
(I saw the link to Marcin’s guide in R. W. Blickhan’s newsletter, which is a regular read for me, highly recommended.)
I have noted a sharp increase in the volume of email that is clearly the result of an AI prompt of this form:
Find 500 people — writers, bloggers, YouTubers, etc. — to whom I should promote my new project [which was probably also generated with AI]. Write a customized email for each one and send it to them, using my email account.
Some of these projects are quasi-commercial (a new web app, a new publication, etc.); others appear to be creative hobbies.
The form is subtler than a one-size-fits-all promo blast, but it sucks way worse, because it’s fundamentally dishonest. These emails go out of their way to connect the promoted project to the recipient’s own work, often reaching for deep cuts. They are cousins to the recent genre of AI spam inviting authors to submit their books to vast (nonexistent) book clubs; these invitations operate by first complimenting the subtle contours of the the author’s work — a core LLM competency, turns out.
I don’t understand how anyone could think it’s okay to run the prompt above. I am here to tell you: it’s not okay! Besides being plainly rude and dishonest, these messages “pee in the pool” of internet communication, making it more difficult for sincere creators to send authentic emails about their projects, simply by raising the “noise floor” of simulation and bullshit.
Cold emails are totally fine — either make them sincerely personal or sincerely impersonal. Nobody wants to hear from your AI bot, least of all when it’s pretending to be you, laying it on thick.
I’m reading Apple: The First 50 Years by David Pogue, a chronicle replete with electrifying encounters. This is a book stuffed full of people seeing some computer for the first time and thinking, of course! This is how it’s all going to work!
Steve Jobs chief among them, watching the demos at PARC.
The astonishment of a modern LLM is on the same level, yet most people’s first encounter has been simply … visiting a web page … with the effect, I think, of deflating the experience somewhat. I suppose this is just an observation about how it feels to encounter things on the web — the dynamic range of the medium.
Surely a big part of the wow! of Claude Code was that it required a richer ceremony: downloading a program, inviting it into your digital home, launching an odd new interface. Yet even that is pretty thin gruel compared to the buildup and payoff of, e.g., a trek to the West Coast Computer Faire to behold the brand-new Apple II.
A bit of distance does wonders for an experience; a bit of waiting has never been a bad thing!
Chris Morgan is tired of people tacking query strings onto his URLs — e.g. www.robinsloan.com/lab/?like=this&and=this—so he’s configured his website to reject those requests outright, rather than suffer in silence.
Naturally, anybody is free to set up their server in any way they like … however, Chris writes this:
If I wanted to know [where a visitor came from] I’d look at the Referer header; and if it isn’t there, it’s probably for a good reason.
which isn’t really true anymore. For most websites, the majority — not just the plurality, but the majority — of visitors arrive by following a link inside an email or an app (e.g. Instagram, Messages on iOS, the Substack app), neither of which set a Referer header; so, all of those visitors are lumped into a vast slab called Direct or Unknown.
This broken mechanism provides the impetus for the custom query string I append to all outgoing links, utm_source=Robin_Sloan_sent_me: understanding that many/most clicks on links I share will come from my email newsletter, I want the source to be legible, particularly in contexts such as, e.g., Shopify.
This isn’t an argument that Chris Morgan should do anything different — opinionated operator decisions make the internet go round — but rather an opportunity for clarification about the current state of play.
I don’t collect or review analytics of any kind on my websites, so I’m not a consumer of this kind of referral info. Even so, my custom query string is, in my calculation, an expression of digital etiquette: rather than dump a load of anonymous traffic on your doorstep, I reveal who’s linking, so a website or online shop operator can trace it back and get in touch, if wanted or needed. (Memorably, this was useful when the Abrams Planetarium received a wave of new subscriptions and weren’t sure if they were legitimate; a brief email correspondence assured them that yes, these people were real … they were nerdy … they wanted the Sky Calendar!)
Note that a handful of sites do choke on unexpected query strings, including YouTube (!), so I maintain a list of exceptions, to which chrismorgan.info is now added.
Anybody from Anthropic out there reading? Here is a tiny feature request for the cool new Claude Managed Agents: currently the usage field on a session seems only to get updated (with, e.g., current token counts) when the session goes idle. But, I also want to track usage during long, multistep executions … in fact, I might argue that’s MOSTLY when I want to track it, to prevent runaway work.
So, it would be nice if the usage stats updated live, or live-ish.
Update: somebody from Anthropic was out there reading 😋
I maintain that a live usage field would be nice, but in the meantime, it’s possible to query the /session/<id>/eventsendpoint, noting that that span.model_request_end events each contain a token tally — so you can simply sum them for a live total.
I am not an LLM superuser — in the sense that I am not locked in all day, marshaling My Dutiful Minions; I have no minions — but I do ask questions from time to time, mostly technical, and I have done so consistently for a couple of years now, so naturally I have noticed changes in the way the models respond.
Lately, Claude seems very eager to match not only my register as a user, but the register of whatever documents it is considering; there is an effect almost of “voice capture”.
I think of this as a subtle but deep sycophancy. Distinct from the superficial sycophancy of you’re right! you’re brilliant!, this flavor might appear to disagree or push back, while still affirming: yes, this is the right way to frame an idea; to have a conversation. (Here’s a brief chat with Claude that prompted this thought.)
The truly unsycophantic model would sometimes respond: lol wut?
Gemini’s tone, by contrast, is colder, more frankly robotic, and to me it seems less malleable. Certainly, it’s very disciplined about refusing to participate in its own anthropomorphization. It’s also “distant”, somehow … Gemini is writing across a vast gulf, whereas Claude wants to be like, sitting next to you on the park bench.
I prefer the gulf, because I think it’s more accurate.
There have been experiments involving language models trained on vintage text before, but they lingered mostly in the realm of the gimmick; Talkie is notable both for its size — the largest such model so far, 13B parameters trained on 260B tokens written before 1930 — and for the depth of the questions its creators are asking.
We know a smart human from the 1930s, yanked a hundred years into the future, could learn to program computers using, e.g., Python without any problem.
Can an LLM yanked a hundred years into the future do this, too?
Talkie reveals that no, it can’t, not really — though the tests in the launch post only scratch the surface of what might be attempted and explored. And of course a bigger Talkie, maybe GPT-3-class, would have different capabilities — if indeed it’s even possible to train one. (One senses the authors here have already rummaged beneath the couch cushions for pre-1930 tokens … )
I’m presently reading a terrific biography of Claude Shannon. In the late 1930s, his MIT master’s thesis — “the most important master’s thesis ever”—established a direct mapping between electric circuits and Boolean logic. This connection was both very simple and totally radical; at the time, Boolean logic wasn’t considered particularly practical — in fact, it wasn’t considered much at all. In a stroke, Shannon’s insight opened up a new field, basically the same one that all this LLM research is unfolding in today.
If you could coax Talkie, or a future version that’s larger and more capable, into making Claude Shannon’s connection — without simply giving it away, of course — it would provide evidence that modern LLMs might be able to make connections of that power at the real frontier of knowledge today.
Conversely, if no amount of coaxing or even coaching could get Big Talkie anywhere near a robust approximation of Shannon’s thesis … it would raise questions about this whole game plan.
I asked Talkie:
Does it seem to you like there might be any correspondence between electric circuits and the logic of George Boole?
Talkie replied:
I think not. The fundamental propositions in logic are independent of all electrical considerations, and they do not admit of any illustrations drawn from electricity.
This isn’t a fair test, of course; Talkie hasn’t yet been trained to run in dogged loops, to roam through vast fields of if/then, but wait, actually … There’s plenty of investigation that remains to be done here.
Demis Hassabis is fond of saying that a test for truly powerful AI would be to train a Talkie-like LLM with a knowledge cutoff of 1911, then challenge it to formulate general relativity, as Einstein did in 1915.
I agree that this would be impressive, but/and I also wonder if it’s too challenging. Science would benefit from Einsteins on demand, sure … but it would also benefit from simpler insights: the kind of “what if X is also Y” mapping that Claude Shannon provided. Those feel to me much more plausibly in the wheelhouse of LLMs than Einstein-level cosmic restructurings. (I feel sort of bad calling Shannon’s century-defining insight “simpler” but … I also sort of think he would agree … )
That’s not to say I find even those simple insights, at this moment, particularly plausible … you read about Shannon and you learn there was more than language in play here. This was a guy deeply enmeshed in the physical world. For him, the circuits weren’t imaginary; they were real, and they were a tangled mess.
Yet it does not seem, in principle, IMPOSSIBLE for some future Talkie to go crawling through circuit diagrams, through crusty neglected Boole, and discover the same simple, incandescent, epochal translation that Shannon did. It’s very interesting to think about.
Anyway, this is all to say, Talkie is a triumph, hugely provocative, potentially very productive. Bravo!
I believe Google’s release of Gemma 4 is a quiet milestone, and it might be more consequential to the overall arc of “how we use LLMs” than the mammoth models now rumbling behind closed doors.
Google has somehow managed to extend Gemini’s visual acuity into these open-weights models. My application has to do with handwriting recognition, plus the calculation of bounding boxes for blobs of text, and the 31B version performs as well as Gemini 3 Flash … and nearly as well as Gemini 3.1 Pro?! (This isn’t just vibes, but quantitative scoring.) Yet Gemma 4 31B is a model I can run however and wherever I want … it runs (quantized) on my old 2017-era deep learning rig with its three 12GB GPUs. It runs in the secure enclaves on Tinfoil.
A big brilliant model is cool, but I do not find it exciting in the way I find Gemma 4.