The voice of the computer

February 22, 2026

Here is the most spectacular demo of present AI systems: pull out your phone, initiate Gemini’s live voice mode, say “please translate this conversation between English and Japanese”, and allow the system to act as a responsive and competent interpreter.

ChatGPT offers a mode like this, too; it’s clear Google and OpenAI have both invested a ton in these features, and that both believe they will represent a significant — THE significant? — interface to their models. Indeed, OpenAI’s upcoming devices are all premised on this.

Meanwhile, it seems odd to imagine “the voice of Claude”. I’m sure Anthropic could buy itself a voice mode, but this interface shouldn’t be mistaken as a language model sandwiched between voice recognition and TTS — it’s subtler and more fluent than that. I believe (though of course I could be wrong) these voice modes are the result of supplemental end-to-end training.

I’ve demoed Gemini’s live voice mode several times, always to great reception, yet never actually used it for a practical purpose. I mean: I was just in Japan for two weeks! And I don’t speak Japanese! And I never used it!

I still imagine I might — it is really sharp — but, for me, it has remained a dazzling demo, not a useful tool. The infinite bummer of “let’s both talk into my stupid phone” remains an insuperable barrier.

I’d love to know the usage statistics that Google and OpenAI are seeing; I’d love to know if and how people are really engaging with the voice of the computer, straight out of Star Trek.

To the blog home page