Voice interfaces are not new to healthcare. Many physicians have used, or still, use Dragon Naturally Speaking (DNS) to transcribe their clinical notes. What is new is the arrival of digital assistants, powered by natural language processing, ambient computing, and machine learning. Their arrival is going to change healthcare, and it won’t just be the physicians who are using them – everyone in the healthcare field will be benefiting including nurses, technicians, and patients as well.
When Siri arrived in 2011, it sparked a new wave of interest in voice interfaces. It was a paradigm shift that captured the imagination of what it could be like to just talk to our devices like so many science fiction stories before us. When working with new clients, people often ask if we could add a voice interface “just like Siri.” Unfortunately, Siri was never meant as a development platform, so the momentum stalled.
Then, in November 2016, Amazon launched Alexa, another natural language interface, and added two important new elements. First, it made the interface accessible and created a marketplace for new voice applications, similar to what Apple did with their mobile app store. Having an accessible interface allowed developers to get behind the technology and innovate. t wasn’t long before new skills came flooding out to control everything from sending SMS messages, to controlling our lights and thermostats. To this day, the Alexa skills marketplace continues to grow with about 50 new skills created every day.
Secondly, the Amazon Echo introduced us to a new concept – ambient computing. The ability for the technology to meld into the background and become invisible created a new freedom and no longer were we tethered to the device, we could simply speak our desires and let the assistant take care of the rest.
Of course, this all sounds great, but people have been predicting the rise of voice interfaces on computers for generations, so what makes it different now? In part, the technology has finally caught up with humans. We’re finding voice interfaces more useful, and less awkward to use. Just look at some of the statistics:
- Major speech recognition platforms have achieved accuracy over 95%, which is on par with humans
- ComScore predicts that 50% of all searches will be voice searches by 2020 - comScore
- 65% of people who own an Amazon Echo or Google Home can’t imagine going back to the days before they had a smart speaker - GeoMarketing
- 72% of people who own a voice-activated speaker say their devices are used as part of their daily routines - Google
All this suggests that voice interaction is not just another fad, it’s becoming a part of our everyday lives.
What Distinguishes Voice Interfaces?
Voice allows us to interact at a distance even when we cannot see what it is that we are interacting with. This means that our concentration and focus can be on a separate task. Speech is also one of our most efficient forms of data input. Most people speak around 150 words per minutes, compared with an average typing speed of 40. Combined together, these two benefits allow us to make relatively complex requests quickly.
However, in spite of our ability to speak quickly, cloud-based speech recognition tends to introduce additional latency before a response is provided. This is due to the speech recognition system waiting to detect when you’ve stopped speaking (and not just paused mid-sentence) and to a lesser extent, the round trip time to the recognition service itself. This tends to be less of an issue if the speech recognition is being performed locally on the device, which may be the case for small command and control grammars.
Where is the Medical Device Industry Headed?
When it comes to medical devices, there are so many different types of devices used in so many different settings that it’s impossible to look at them all at once. However, there are a few key use cases where digital voice assistants have real potential to shine and impact the industry.
The Examination Room
Eric Schmidt from Google highlighted this use case in his keynote at the HIMSS 2018 conference. Whether it’s your Primary Care Physician, or a specialist practice, having a listening device in the room with your patient has a lot of potential for capturing clinical notes, identifying billing codes, or even providing clinical decision support during the encounter.
A good digital assistant may update the electronic medical record (EMR) with relevant information just from the back and forth conversation between you and your doctor. It may also be able to provide decision support and propose and prepare scripts silently in the background for review and signature by the physician. Using a digital assistant in this manner has the potential to free up physicians to focus on their patients rather than their EMRs.
In this scenario, it’s likely that the voice interface would be primarily for listening with feedback provided via a different medium (such as a computer screen), so as not to interrupt the flow of the encounter. This would also provide the opportunity for the physician to review any decision support material before sharing it with the patient.
The Operating Room
Being a touchless interface, there’s a lot to be said for interacting with devices in a sterile environment via voice. The most common concern is whether the surgical mask will muffle the sound too much, but so far this has not been an issue. If the environment is sufficient for you to hear and have a conversation with a human on the other side of the room, then speech recognition can be expected to work equally as well.
If the device is being used in a noisy environment, such as while drills or other devices are being used, techniques like limiting the size of the grammars to improve recognition accuracy may help, or it may be necessary to wait until noisy activities subside.
The Recovery Room
Whether it’s staying in a hospital room, or after discharging a patient to recover at home, voice interfaces represent a new opportunity to connect patients into their ecosystems, especially when they have restricted mobility. Starting with simple things like being able to dim lights, adjust the temperature of a room, control the audio levels, voice can empower users to maintain control of their environment. Then, moving to more integrated options, voice can also order food, request nursing assistance (and be able to articulate the reason for the assistance so that nurses can prioritize appropriately), or find out more about your medical condition from trusted sources such as HealthWise and Health Navigator. You can even use it as a notepad to record questions for your physician on their next visit, or that can be forwarded to your care team.
One challenge still faced by these systems is speaker identification and maintaining patient privacy. Currently, voice recognition (the ability to identify the speaker) is something that most speech recognition systems support to a limited degree. As these systems become more powerful, we can expect to see more secure interfaces arrive that handle sensitive patient information such as accessing patient records.
Surveys, Feedback, and Clinical Trials
Surveys and clinical trials represent bountiful opportunities to simplify the user interaction and increase patient engagement. By providing a voice interface, we provide another touch point to gather information and allow users to do this while they’re completing other tasks. Imagine being able to report your daily test results in the bathroom while brushing your hair or in the kitchen making breakfast.
Companies like Amgen have started adding voice interfaces to make it easier for patients to complete their daily journal, and Orbita is working with clinical trial data firm ERT to include voice as a component of data collection (complete surveys, verify completion, and report health concerns).
Integration with Care Management Platforms
Care management platforms monitor users and provide feedback to clinical staff. Reporting data direct from medical devices is one thing, but in the context of a care management platform, augmenting that with a voice interface allows the platform to start a conversation about the context around your data. Why was your blood glucose reading high for the past few days this week? If we can gather information from multiple devices to identify that you’re not sleeping well and have put on weight, a digital assistant can enquire about what’s happening in your life leading to richer sources of information and potentially earlier interventions.
Home and Elder Care
The introduction of Amazon Alexa spawned many new use cases for voice in the home – from reminders to home automation and personal or medical alert response systems. A small trial by the Front Porch Center for Innovation and Wellbeing in California found that by the end of the trial, 100% of participants felt that Alexa overall made their life easier. Although, the trial also highlighted some of the challenges of voice interfaces for the elderly, with one being the limited ability to adjust treble and bass for those with hearing impairments (at least Google Home now provides this feature).
Voice interfaces are also finding their way into medical products through less direct mediums by being leveraged for sales and marketing tools. Explanations of features, comparisons with competitor products, trivia, and quiz-based challenges are all ways that have been used to engage users and sales personnel without needing to make changes to the core functionality of a product or impacting its regulatory approval.
These approaches provide an opportunity to start working with and learning the nuances behind voice interfaces before integrating them into the core functionality of a medical device.
Challenges Against Widespread Adoption
Some obstacles remain for the widespread adoption of voice in a medical setting. Most notably, none of the major voice service vendors yet provide a HIPAA compliant platform for handling speech. Although Amazon is rumored to be supporting HIPAA compliance in some form later this year.
While this limitation has slowed some organizations down, others are moving forward with the expectation that speech technologies will be compliant by the time their systems are complete. Others are using carefully designed architectures to de-identify the entire speech pipeline and provide patient matching in an alternative HIPAA compliant environment.
There are also well-known challenges with the Alexa skills marketplace where discoverability of new skills is problematic. Voice interfaces are not well suited to responding with long lists of information or for browsing. However, for anyone developing voice interfaces in the medical environment, they’re typically being bundled with other components like companion mobile apps or a physical device, rather than being discovered directly on the marketplace.
Cumbersome Invocation Phrases
Using Alexa Skills or Google Actions currently requires a slightly cumbersome turn of phrase to invoke the software component that you want to interact with. Having to say “Alexa, tell [My Skill] to do something” can be difficult to remember for skills you don’t use regularly, and more than a little repetitive when you’re using it often. While this is necessary to safeguard privacy currently, we should expect that there will be simplifications for interacting with individual devices in the future. Similarly with being able to interrupt a device once it’s started responding and chaining requests together so that we can ask for multiple things at once – although we’re already starting to see support for these functions.
Working with Voice Interfaces
Voice user interfaces can be broken down into three different types:
- These interfaces are often invoked using a non-verbal mechanism and are intended to accurately transcribe what the user is saying until they are finished.
- Command and Control
- A command and control interface typically has a limited set of words or phrases that it understands, and it responds by performing the action requested by the user. Examples of this could be “Start”, “Stop”, “Next Image”, “Zoom In” etc. These interfaces are particularly useful when touching the devices is infeasible or undesirable and can be implemented without cloud support.
- Natural Language
- Natural language interfaces are an attempt to interpret human speech as it would be used in everyday conversation. While it may include phrases to control something, there will likely be many different ways to express the same intent. It may also be mixed with dictation activities and search expressions.
Dictation and Command and Control interfaces have been around for many years and are relatively straightforward depending on the content or set of actions that can be performed. However, Natural Language interfaces introduce new challenges in order to make them as user-friendly as possible.
Like any interface, understanding who your users are and their context of use is essential to building an interface that people will enjoy using. Voice interfaces often fit into a multi-modal design where both a visual and verbal input or output are provided. This allows the user access to the medium best suited to the information.
When voice interfaces need to read back information, consider the tone and personality that you want your device to have, and the aim to be perspicuous and concise in your responses. It’s also important to keep your vocabulary consistent as the project grows. Build word lists and dictionaries that you can refer back to.
When working with surveys, users will often need more help and guidance when they first start, but once they become familiar with the questions, they will want to answer them more quickly, often answering multiple questions in a single response. Regardless of how the responses are provided, use the content to prune the conversation tree so that you’re only asking relevant questions.
If you want to create a natural sounding conversation, consider introducing micro-interactions to help it feel less structured. Feedback like “yes”, “yep”, “got it”, “good”, “great”, “thanks” can help reinforce a message, whereas phrases like “er”, “um”, “hang on”, “one sec” can help indicate that the context of a conversation may be changing. A great example of this is Google’s Duplex, introduced in 2018 at their annual Google I/O conference.
Sooner or later, you will encounter failure scenarios in your voice interface, either because the user’s utterance was not recognized properly, or they didn’t respond, or they may have been interrupted and be having a completely separate conversation now. Inferring the cause of the failure is often difficult, so context becomes critical to determining how to handle it. Options may include repeating the question, repeating the answer, or both, pausing, or even ending the conversation. With any medical device, we need to consider the risk of the voice interface failure and have appropriate mitigations in place.
Finally, voice interfaces, just like all other applications, need to evolve over time. Make sure to include the appropriate instrumentation into your applications so that you can monitor how they’re being used and plan for updates to reduce friction whenever it is identified.
In conclusion, voice interaction is already becoming part of our everyday experience and it’s natural that this will converge with our healthcare needs. There are currently pilot projects being tested today that utilize this technology in healthcare, many of which are still in a learning phase focused on establishing best practices for the medical devices of tomorrow. If you’re not using a voice interface already, start – it’s the best way to learn how they operate and they’re not going away any time soon.