Growing up, we were enthralled by the magic of casting spells and opening doors upon a single command. Today, we know that some of that magic is made possible for an average person with voice activation, backed by elaborate machinery of programmed logic and hardware that enables commands to be carried out in a split second.
As the interface between a user and the product or service, voice technology (voicetech) has the potential to dramatically improve the way we interact with the products, but it is seldom highlighted as an area of focus on its own. Hidden under the proliferation of text-based chatbots, the limitless potential of artificial intelligence (AI), and the continued efforts to improve natural language processing (NLP), is the stoic voicetech – it quietly combines the progress of many foundational technologies.
Moving from text-based chat to voice seems like a natural progression, since both are functions that require NLP. From voice messaging, to road navigation, to simple commands directed to a phone, voicetech is not unheard of in Southeast Asia. The foundations for it to take off have already been laid with increasingly affordable smartphones, smart speakers and Internet of Things (IoT) making their way into more and more hands.
Voicetech: a culmination of various technologies and trends. Diagram taken from “Deloitte Tech Trends 2020”
Creating the Abracadabra moments
In November 2019, Amazon unveiled a new voicetech feature – the Alexa Voice Service (AVS) Integration – as part of its AWS IoT Core. This is a step-up from the existing Alexa Amazon voice assistant, which controls software applications and delivers mainly information and entertainment. Now, product makers can build Alexa voicetech into any smart device by migrating memory-intensive tasks onto the cloud, reducing the cost of building a smart physical environment. Developments like this point to a future of more abracadabra moments in the coming months and years.
At the heart of a digitally-enabled magical moment is effective speech recognition, such that the voice medium can be accurately captured as data for processing requirements. As our AI engineers at the Deloitte Cognitive Analytics Solution Centre remind us, investing in NLP requires a leap of faith because it is not always explainable or predictable. As a standalone technology, voicetech does not translate directly to topline revenue. In fact, it will require further resources to help create use cases before the technology can be monetized.
The work done in foundational technologies is crucial in linking us to the next stage of possibilities. However, investments into these technologies often present a conundrum for companies in smaller markets like those in Southeast Asia because of the perceived lack of scale and returns in the short run. What’s changing the playing field is when big technology companies like Amazon and Google open-source their NLP or make their technology stacks available at low cost. This, coupled with bountiful localization opportunities, has been spurring innovation and materializing benefits of the technology for an otherwise overlooked market segment.
Serving local markets
Southeast Asia has benefited from open innovation, but it would take more than technology stacks to make a business. From creating concierge services and personal assistants, to building developer suites, Indonesia has seen a crop of chatbot startups like Botika, Lenna.ai, Kata.ai, and BJ Tech emerging to serve a range of users. Their ability to appreciate the fluidity and dynamism of Indonesian slang is important as it determines the training data input into their NLP engine, which would then shape the output and efficacy of their solutions. Their knowledge of the local markets also means that they are able to better contextualize their chatbots to suitable applications and channels of communication.
Botika, founded in 2016, announced last year that it is expanding into IoT-based voice services. This endeavor continues its steady expansion into an array of chatbot solutions, from enterprise applications, to omni-channel solutions covering social media, and also a voicebot for the Indonesian market.
Meanwhile, in other countries, Thai startup Zwiz.ai and Myanmar’s Expa.ai have made headway in their business automation and analytics offerings delivered with chatbot assistance, as they have the advantage of knowing the Thai and Burmese languages to develop technology better suited to capture local speech patterns. While these are not yet spoken services, the NLP library is constantly being enriched, and will contribute to the language-specific development of voicetech in future. Chatbot companies know that in order for their technologies to be adopted, they have to serve the true functional needs of their users.
It’s not always what you say
Beyond bots to replace human customer service representatives either through text or through voice, we see voicetech becoming a functional part of security and access control. Some of the best business use cases for voicetech can already be seen in retail banking and related industries. As early as 2015, OCBC Bank in Singapore rolled out voice biometrics as an alternative way for customers to access banking services. Like fingerprints, voiceprints are unique identifiers as they are based on frequency markers that can be digitally captured from our voices. When voiceprint is introduced into a voice transaction, it strips away the need for question-and-answer verification, and provides a smoother and arguably more secure experience for customers.
From a business point of view, this technology is scalable too, as the effectiveness of voice authentication is language-agnostic. Shenzhen-based startup VoiceAI Technologies may be very focused on the Chinese market, but it was able to deploy its voiceprint technology quickly and efficiently in 2018 for Indonesian state institution TASPEN to disburse pensions to 2.5 million retirees, providing unprecedented convenience for identity verification and labor cost savings in its administration. Unlike voicetech applications that are dependent on speech recognition, voiceprint technology does not require intimate familiarity with the 580 languages and dialects that exist across over 6000 inhabited islands that make up Indonesia in order for it to be accurate and effective for its users.
From take-off to landing
Thanks to the increasing penetration of the smartphone in Southeast Asia, one can expect to see more people adopt voicetech as they make use of the multiple functions available through their phones to stay connected and navigate their lives. And this is just the starting point.
A welcome prospect of voicetech lies in its ability to include “forgotten” groups of people, like those who struggle with gadgets or those who are illiterate. By blending into an “ambient” experience, well-deployed voicetech can feel like a seamless experience that will not require a change in habit or much effort, yet bring about convenience and service to the users. In the current landscape, we are looking at solutions like DeloitteASSIST to transform hospitals into digitally-enabled environments. For example, even if the nurses are occupied with other important duties, patients can make requests to a DeloitteASSIST smart speaker, and the operating artificial intelligence will interpret and prioritize the request accordingly and route the right resources and responses to the patient.
While we explore more use cases and experiment with this technology, the full magic of voicetech remains to be seen. But now more than ever before, it is clear that we are coming to the time where we can literally just say the word and our wishes will come true.
About the authors:
Richard Mackender leads the Deloitte Southeast Asia (SEA) Innovation team, a cross-function, cross-country unit dedicated to driving innovation as a long-term value creator across Deloitte’s Southeast Asia operations. This article was co-written with Chen Liyi and Michelle Felicia, who are members of the team.