by James Vlahos, Houghton Mifflin Harcourt, 2019
Steve Jobs could be relentless when he wanted something. In early 2010, he wanted a small startup in San Jose, Calif. CEO Dag Kittlaus and his cofounders had just raised a second round of funding and didn’t want to sell. Jobs called Kittlaus for 37 days straight, until he wrangled and wheedled a deal to buy the two-year-old venture for Apple at a price reportedly between US$150 million and $200 million. The company was Siri Inc.
Wired contributor James Vlahos tells the story of how Siri took up permanent residence in the iPhone in his new book, Talk to Me. It’s the first nontechnical book on voice computing that I’ve seen and a must-read if you have any interest in the topic.
Vlahos spends the first third of Talk to Me describing the platform war currently raging in voice computing. It details the race among the big players, including Amazon, Google, and Apple, to embed AI-driven voices in as many different devices as possible, as they seek to dominate the emerging ecosystem. The fact that Amazon now has more than 10,000 employees working on Alexa provides a good sense of the dimensions of that race.
But voice computing is more than a platform play. It is likely to have ramifications and applications for every company, especially if Vlahos’s contention that “the advent of voice computing is a watershed moment in human history” turns out to be right.
“Voice is becoming the universal remote to reality, a means to control any and every piece of technology,” he writes. “Voice allows us to command an army of digital helpers — administrative assistants, concierges, housekeepers, butlers, advisors, babysitters, librarians, and entertainers.” Voice will disrupt the business models of powerful companies — and create new opportunities for upstarts — in part because it will put AI directly in the control of consumers, Vlahos argues. “And voice introduces the world to relationships long prophesied by science fiction — ones in which personified AIs become our helpers, watchdogs, oracles, and friends.”
This fanciful future is barely evident in the present relationship I have with the cylindrical smart speaker sitting on my desk. The ring around the top of the speaker is usually glowing red. That’s because it’s on mute, which, I’m assured (but not completely reassured), precludes its maker’s employees from listening to me at will. But I do get a glimmer of how valuable that relationship could become when I unmute the device and ask it for the weather forecast, or order up a particularly tasty Grateful Dead jam. Voice is the simplest, most natural interface with technology yet invented.
“With voice,” Vlahos explains, “computers are finally doing it our way. They are learning our preferred way of communication: through language. Voice, optimally realized, has the potential to be so easy to use that it hardly feels like an interface at all. We know how to speak because we’ve been doing it for all of our lives.”
The key words here are “optimally realized.” It is abundantly clear that voice technology is far from it. Vlahos describes why in the remaining two-thirds of Talk to Me, which is devoted to explaining the technology and to exploring the challenges and decisions that lie ahead.
Voice computing is enabled by a mashup of technologies. “The sound waves emanating from your mouth must be converted into words, a process known as automated speech recognition,” writes Vlahos. “Determining what you were trying to communicate with those words is called natural-language understanding. Formulating a suitable reply is natural-language generation. And finally, speech synthesis allows voice-computing devices to audibly reply.”