The future of... voice
A view from Tracey Follows

The future of... voice

Phones are now powerful mobile computers. But how long until voice-activated handsets lead the conversation, asks Tracey Follows, chief strategy and innovation officer at The Future Laboratory.

There is no doubt that we are enjoying a visually led culture right now. The explosion in video is highly visible. Less obvious is what is taking place with voice. It is tempting to think that voice communication is on the decline – and if one views it from the perspective of traditional telephony, that is certainly the case. But voice is a medium that is increasingly cropping up in new places; liberated from the traditional landline, it can now go anywhere.

The rise of voice

Voice has become embedded in many websites. More importantly, it’s taking off in search. In the US, well over 40% of Google search queries are now voice-activated. One suspects that the UK user base won’t be far behind. This is another step toward the transformation of our mobile phones into dictaphones that direct digital personal assistants to perform our daily tasks.

Perhaps it is time for brands to think about voice communication with their customers, because ‘communication convergence’ between man and machine looks inevitable.

Amazon’s Echo is the in-home equivalent, and an example of a far-field voice-recognition service that searches everything in your life for you, from Yelp to Spotify to your Google calendar – and gets smarter the more you use it. But it could even get to the point where our digital assistants give us advice on our behaviour, based on our tone of voice. If we are getting heated on a call, perhaps our assistant will suggest we take a break or allow the other person to have their say. If we sound stressed, perhaps it searches for and suggests only calming music choices on Spotify.

There will be a huge growth in the contextual information surrounding voice. Our devices will come to sense our movements, heart rate, biometrics and haptic indicators, and put those together with our locations and data history, plus our own voice and even other audio signals in the surrounding environment.

Evolution of voice experimentation

Acoustic engineers at Doppler Labs in New York have created the Here Active Listening System, which can filter in and filter out everyday noise. The idea is that you can customise the sound around you with a variety of combinations, in order to create the optimum audio experience in any situation. Imagine, I may want to turn up the volume of the sound of birdsong as I make an early-morning call during a country walk, to someone who can’t be there with me at that moment.

But the most interesting evolution of voice experimentation that I have come across is from MIT, where researchers, along with Adobe and Microsoft, have developed an algorithm that can reconstruct an audio signal from a piece of video. Put simply, their experiments have watched people speaking behind soundproofed glass and then analysed the vibrations from the crisp packet one of the conversationalists happened to be holding.

The question is, whose voice will be in command? 

The sound affects the packet and those minuscule vibrations show up as signals, which, while undetectable by the naked eye, can be analysed by high-speed cameras. As a result, scientists can reconstruct the entire conversation – and literally see what these people were saying.

Perhaps it is time for brands to think about voice communication with their customers because ‘communication convergence’ between man and machine looks inevitable, with the internet becoming the interconnective tissue.

The question is, whose voice will be in command? Will humans be voice-activating their machine assistants, or will machines speak to us directly and suggest what to do next? Even if the latter is the case, one can bet that it won’t be any old synthesised voice we hear, but a perfect replica of our own voice talking back to us. Man and machine, literally in harmony.

Topics