From Apple’s Siri to Microsoft’s Cortana and Amazon’s Alexa (not to mention the nameless Google Home assistant), a chorus of pleasant female voices is taking over our computing instruments. These “virtual assistants” are at the cutting edge of mobile technology, representing the confluence of advanced artificial intelligence and voice recognition techniques.
Having a pleasant, coherent conversation with your computer is no longer something in the realm of science fiction. By allowing users to communicate using only their voice, these systems are changing the very nature of how we interact with our mobile devices. So what do you need to know about voice user interfaces, and how are they set to impact the future of mobile applications?
A Brief History of Voice Interaction
Incredibly, the history of speech recognition systems stretches all the way back to 1952, when Bell Laboratories developed “Audrey.” Short for Automatic Digit Recognizer, the Audrey system was able to understand strings of digits from 0 to 9 with a 97-percent accuracy rate. However, Audrey was too expensive and consumed too much space and power to be of interest to general consumers.
Although institutions such as IBM and the U.S. Department of Defense experimented with speech recognition in the following decades, it was only in the 1990s that the technology truly came to the masses. Dragon released its first consumer speech recognition product, Dragon Dictate, in 1990, and BellSouth launched the first consumer voice portal, VAL, in 1996.
Soon, these initial forays into speech recognition were followed by built-in voice commands in Windows Vista and Mac OS X, as well as a number of interactive voice response systems for telephone callers. Finally, voice interaction arrived on mobile devices for the first time in 2008 with the release of the Google Voice Search app for iPhones. This same technology would later be added to Google’s apps for the Chrome browser and Google Maps.
Today, voice recognition apps are ubiquitous across mobile devices, and their popularity is only growing. Apple’s Siri virtual assistant processes 1 billion queries per week as of 2015, while 20 percent of Google searches on mobile are now done through voice recognition.
VUI and Mobile Development
Voice user interaction represents a unique challenge for mobile developers. As human beings, we are social, vocal creatures, which means that using our voice to communicate should be natural. However, voice interaction is still a significant departure from the keyboards and touchscreens that you might be accustomed to using. All the traditional guidelines for designing for iOS or Android, or graphical user interfaces in general, have very little relevance to VUIs.
One of the biggest challenges of VUI design is understanding your users’ expectations. Human beings have certain innate assumptions about how communication takes place between two speakers, which can often lead to ambiguity as the computer struggles to compensate for this missing context. For example, a query of “One lump or two?” would be perfectly understandable to a person asking for tea, but could easily trip up a virtual assistant.
Companies and developers, especially those with an app already in place, need to think hard about the ways that they can use VUI to truly enhance the user experience. When it comes to VUI, it’s all too easy to get it wrong by trying to hop on the bandwagon of the newest, hottest technology. For example, a company might decide to overhaul its telephone support line by installing a VUI just because it’s “easier” for callers, but fail to consider how callers will interact differently with the new VUI than they would with the old push-button keypad interface. Instead, consider how you can use VUI to supplement your existing brand and make it more approachable and helpful.
To address these shortcomings, mobile developers need to be explicit about what the voice recognition system can and cannot do. For example, a sports app with a VUI should inform users on launch that it can retrieve football and basketball scores from the past month, as well as individual players’ statistics for the season. Of course, you should also avoid overwhelming your users with too many options.
In addition, the app needs to make it very clear what question it’s answering so that users know that their query has been understood. Rather than simply giving a score such as “28–14,” the app should provide context such as the names of the teams and the date on which the game was played.
During conversations between two humans, it’s easy to tell whether someone is listening to you through body language cues like eye contact. Mobile devices, however, have no such form of expression (at least not yet). Instead, you should use simple visual feedback to inform your users that the VUI is listening to them as they speak.
In order to get the best performance out of your VUI app, it’s crucial to integrate it with conversational APIs such as api.ai or Amazon’s Alexa Skills Kit API. Services like Dropsource are able to connect to any REST API, making the integration process a snap.
Of course, good VUI design is much harder than it looks. Some of the practices to avoid while building your VUI include:
• Not asking the user a question when the app expects a response.
• Not being clear about the user’s options. For example, “Which sport do you want scores for, football or basketball?” is a better question than “Do you want scores for football or basketball?”, since the user might simply respond “Yes.”
• Giving the user too many choices or being too verbose (for example, “Say ‘football’ for football. Say ‘basketball’ for basketball…”).
• Confirming the user’s query too often. Confirmations should be reserved for important actions such as sending a message or making a purchase.
Hey Siri…Call Mom
One of the most successful examples of VUI design, according to the above principles, is Apple’s Siri virtual assistant. When users start Siri, they’re presented with a list of ideas such as “Call Hannah” and “Play some rock music.” In addition, the system provides both visual feedback (in the form of a glowing white line) and tactile feedback to let users know when it’s listening. And the rumor is that millions of Moms across the world have been called a little more often thanks to Siri’s help.
Designing great VUIs isn’t easy, but there’s little doubt that voice interaction is the way of the future. As more and more users use voice-controlled virtual assistants and apps on their mobile devices, getting your VUI right will be ever more important.