The voice of the computer
Here is the most spectacular demo of present AI systems: pull out your phone, initiate Gemini’s live voice mode, say “please translate this conversation between English and Japanese”, and allow the system to act as a responsive and competent interpreter.
ChatGPT offers a mode like this, too; it’s clear Google and OpenAI have both invested a ton in these features, and that both believe they will represent a significant —
Meanwhile, it seems odd to imagine “the voice of Claude”. I’m sure Anthropic could buy itself a voice mode, but this interface shouldn’t be mistaken as a language model sandwiched between voice recognition and TTS —
I’ve demoed Gemini’s live voice mode several times, always to great reception, yet never actually used it for a practical purpose. I mean: I was just in Japan for two weeks! And I don’t speak Japanese! And I never used it!
I still imagine I might —
I’d love to know the usage statistics that Google and OpenAI are seeing; I’d love to know if and how people are really engaging with the voice of the computer, straight out of Star Trek.
To the blog home page