Implementing A Virtual Assistant for Your Organization: What To Consider

The 2019 Horizon Report stated that widespread adoption of virtual assistant technology is expected within four to five years. Machine learning advancements have improved the speed and accuracy of automatic speech recognition (ASR) and natural language processing (NLP), which power services like Siri, Google Assistant, Alexa, Cortana, Bixby, and others. ASR helps interpret the sound of words by transcribing speech into text while NLP helps define and extract the meaning of words, and both play a big role in how virtual assistants work.

Statista reports that by 2022, 66.3 million US households will own a smart speaker. Undoubtedly, smart speakers and other virtual assistants, in addition to gaining popularity in homes, are being used in academia to help enrich the college experience. While voice assistants provide a conversational — and in many cases more intuitive — way to communicate with machines, the fact that these devices are always recording (i.e., passively listening) is an issue that must be researched completely before going live with any virtual assistant, especially if it is used in a public setting.

Beyond privacy concerns, there are other questions to ask if one is planning to implement a virtual assistant in his or her organization. First, it is important to determine the purpose of developing a voice assistant. What does the organization want it to do? Is the plan to extend an already existing service to incorporate voice technology or to start something completely new? Is the purpose to eliminate repetitive tasks, to guide users to quality information, and/or to find solutions so users can focus on what is most important? Will the voice assistant simply answer basic questions or will it learn from its interactions with humans? Cleo, an Amazon Skill, awards badges to users who help Alexa learn more about languages and cultures so “she” can be multilingual and better understand and interact with all people. Remember that voice assistants allow users to have their hands and eyes free to do other, hopefully more meaningful, tasks.

Second, decide what voice assistant platform(s) you wish to use. Sending out surveys to potential users to gauge their interest in using a voice assistant, or to determine which one(s) they use most frequently, is a good idea. That way, an organization can determine if an Alexa Skill, a Google Action, or something else is the way to go. If possible, it is a good idea to consider building your voice assistant across multiple platforms so everyone has equitable access to it. However, at this time, this would not be an easy and affordable task, unfortunately. Going further, multi-language input and support is another vital consideration for your voice assistant and many application programming interfaces (APIs) can handle this, but a developer will need to know how all the parts fit into one cohesive application. An API allows applications to communicate with one another and typically returns data, likely in JSON or XML format, to your application.

There are billions of mobile devices, including cars, watches, speakers, televisions, and more, that have voice assistant technology built into their operating systems and this is an advantage. One downfall, however, is that voice assistant platforms are owned by big companies like Google, Apple, Microsoft, Amazon, etc., so third-party developers must follow these companies’ rules and pay their fees to modify the voice assistant for an organization’s needs. Worth noting is the open Web Speech API that gives web applications the ability to handle voice data, specifically Text-to-Speech and Speech Recognition. It is also possible to change the default voice with Speech Synthesis Markup Language (SSML), so a completely open and personalized voice assistant can be built without using a proprietary voice assistant platform, but that would require a lot of work to accomplish.

Northeastern University gave 60 students an Amazon Echo Dot that would provide answers to their top 20 Call Center questions. This eliminated their usual long wait times to speak to a person at the Call Center and/or searching online for answers to common questions. Getting an immediate answer from Alexa frees up time for students to focus on more important things, like learning and being more engaged with their college culture. Each student that received a device signed a Family Educational Rights and Privacy Act (FERPA) form to allow access to their student records. Saint Louis University has over 2,300 Echo Dots, located in residence hall rooms and student apartments that have been programmed to answer over 100 questions.

In these two university cases, the virtual assistant was programmed using an Amazon Echo Skill. Northeastern University worked with n-powered, who created an OpenAPI that connects to their university system. An estimated cost for services was not available on the n‑powered website. A quote and consultation was requested, but was not received by the time of this writing. In 2019, third Generation Echo Dots cost around $50; however, Amazon Education offers discounts when purchasing devices in bulk. It is fairly safe to assume that the other big companies mentioned earlier would offer similar discounts. For many, it may seem like a marketing ploy and this is another issue to consider if an organization desires to incorporate virtual assistant technology into their arsenal of services.   

Creating an Alexa Skill, Google Action, or other voice commands requires setting up a service that takes voice requests, interprets them, and sends back a response to the device. Intents, utterances, slots, invocations, actions, and other functions are included in various development environments (e.g., Alexa Developer Console, Actions Console, Snips Voice Platform for Raspberry Pi, Azure AI, and others) that allow developers to create the basis of a voice assistant. Although machine learning technologies do most of the behind-the-scenes magic (i.e., ASR, NLP), developers need to spend time configuring the voice assistant, which, depending on the complexity of the system, may take a few hours to several weeks to build, test, and deploy.

Once the voice assistant has been configured and subsequently tested via a simulated device that is built into the development environment, most organizations choose to use that company’s service to host their application. Amazon’s AWS Lambda allows developers to upload their code without worrying about administering servers. It has been stated that Lambda is easy to use as long as your Skill does not store user information. Some academic Skills do connect users to their university systems, so the time and money needed to configure Lambda will need to be calculated into the proposal.

Google’s Cloud Platform is a suite of cloud computing services that includes APIs that run Google Assistant and a variety of other services. At the time of this writing, Google was offering a $300 credit that can be used with your projects. Many of Google’s APIs cost money per transaction, so if a proposed voice assistant is forecasted to receive a lot of use, then this will need to be factored into the total cost of implementing such a service. For instance, Google’s Cloud Speech-to-Text API is free if the app uses less than 60 minutes a month. After 60 minutes, it will cost $0.006 for every 15 seconds. Google’s APIs are getting more powerful (e.g., translation, audio transcription, speaker diarization, etc.) as artificial intelligence advances, but one should keep in mind that everything will have to go through their servers to get things to work. Registering for these accounts usually requires a credit card on file, but an organization only has to pay fees when compute time is used.

Once your voice assistant is done, make sure to have a well-written description so users can find and enable it easily. It is vital to create an easy to remember name to invoke the voice assistant, too. Remember that people are speaking this name to the voice assistant, so keep it clear and simple, and be sure to test it with a variety of people to make sure it is intuitive for everyone to use before publicly releasing it.

Going beyond answering simple questions, it seems that the next step may be to offer research support, tutoring, translation, and other academic services. VoxScholar has a few Google Assistant actions that offer study tips and customized lab tutors to help University of Colorado Denver and their Anschutz Medical campus students meet their learning goals. Before one can use VoxScholar, however, Google Assistant will ask to connect the student or faculty member to their university credentials. Library vendors are starting to add voice commands to their applications, too. For example, EBSCO’s Discovery Service API allows users to search and access content via Alexa and Google Home.

Voice assistant technology is in its infancy stage. In the near future, imagine an “information‑literate” voice assistant that can help streamline the research process. It will suggest articles while scanning emails and other documents to help us clearly pinpoint important information. It will use NLP to be more conversational (see Google Duplex) to help users solve problems, better detect fake news, and be more astute information consumers. It will integrate RFID tags to locate and guide a user to items in a library while controlling a variety of Internet of Things (IoT) devices with simple voice commands. With video virtual assistants, it will send examined video to a variety of display screens, and much more.

Ultimately, the main goal in creating a virtual assistant would be to alleviate some of the mundane work humans do so they can focus more on actually being human. When virtual assistants work correctly, they can give humans more time to experience what truly matters in their lives. Finally, one should not feel obligated to create a virtual assistant just because it is innovative and popular. You may want to give this technology time to evolve and mature because, as time marches on, it will undoubtedly become easier and less expensive to build virtual assistants to accomplish many everyday tasks, giving humans more quality time using the information that will hopefully make our lives better.