This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.

The voice of the computer

February 22, 2026

Here is the most spec­tac­ular demo of present AI systems: pull out your phone, ini­tiate Gemini’s live voice mode, say “please trans­late this con­ver­sa­tion between Eng­lish and Japanese”, and allow the system to act as a respon­sive and com­pe­tent interpreter.

ChatGPT offers a mode like this, too; it’s clear Google and OpenAI have both invested a ton in these features, and that both believe they will rep­re­sent a significant — THE significant? — inter­face to their models. Indeed, OpenAI’s upcoming devices are all premised on this.

Meanwhile, it seems odd to imagine “the voice of Claude”. I’m sure Anthropic could buy itself a voice mode, but this inter­face shouldn’t be mis­taken as a lan­guage model sand­wiched between voice recog­ni­tion and TTS — it’s sub­tler and more fluent than that. I believe (though of course I could be wrong) these voice modes are the result of sup­ple­mental end-to-end training.

I’ve demoed Gemini’s live voice mode sev­eral times, always to great reception, yet never actu­ally used it for a prac­tical purpose. I mean: I was just in Japan for two weeks! And I don’t speak Japanese! And I never used it!

I still imagine I might — it is really sharp — but, for me, it has remained a daz­zling demo, not a useful tool. The infi­nite bummer of “let’s both talk into my stupid phone” remains an insu­per­able barrier.

I’d love to know the usage sta­tis­tics that Google and OpenAI are seeing; I’d love to know if and how people are really engaging with the voice of the computer, straight out of Star Trek.

To the blog home page