Our vision for Alexa is to not only be useful, but to markedly improve the lives of millions of people worldwide. The Alexa Trust and Alexa AI teams work every day to make this vision a reality. To help mark Data Privacy Day on January 28, we sat down with Anne Toth, Manoj Sindhwani, and Prem Natarajan to discuss how Amazon protects customer privacy while using data responsibly to improve the Alexa experience.
Why do you need to collect Alexa customer data and how is it used?
Toth: First and foremost, Alexa can’t answer a question without collecting and processing that voice interaction, which seems obvious, right? But what makes it possible for Alexa to understand and respond accurately is all of the complex learning and constant refinement that makes Alexa better and smarter for all our customers with each interaction. It takes data to do that. One specific example is understanding human speech. Speech is complex and varies substantially based on region, dialect, context, environment, and the individual speaker. This includes factors like whether they are a native or non-native speaker of the language and whether they have a speech impairment. Training Alexa with customer data is incredibly important because for Alexa to work well, the machine learning models that power Alexa need to be trained using a diverse, wide range of real-world customer data. This is how we can ensure the service performs well for everyone, and under all kind of acoustic conditions, at home or on the go.
Multilingual Mode is a great example of how data makes it possible for Alexa to be both inclusive and accurate for diverse households. Hundreds of thousands of customers across the U.S. are using Multilingual Mode on their Alexa devices to seamlessly switch between English and another language—like Spanish. Some customers have shared with us that Multilingual Mode helps their entire family access Alexa.
Sindhwani: Exactly. Data is what makes Alexa smart. Training our speech recognition models with the latest data patterns allows our teams to provide a useful, accurate, and even entertaining experience.
Training with voice recordings is why Alexa can distinguish if a customer is asking for the weather in “Austin” versus “Boston,” or the difference between “U2” and “YouTube.” And, while customers did not ask Alexa to play songs by Lil Nas X when we introduced Alexa in 2014, training with voice recordings helped Alexa to quickly learn all the varied ways customers pronounce his name and request to play his music.
Training Alexa with data over time also helps Alexa accurately answer questions about events that happen once every several years like the Olympics or World Cup. Understandably, customers tend to ask Alexa more frequently about “Curling” during the Winter Olympics, and these questions are easier to understand if Alexa is trained on historical data. Similarly, quickly training Alexa with voice recordings also ensures accuracy on trending topics where there’s less historical knowledge—like COVID-19, Brexit, or NBA champion Giannis Antetokounmpo.
Continuously training our machine learning models with customer data is the reason Alexa’s understanding of customer requests has improved by an average of 37% over the last three years across all languages.
How are your teams protecting customer privacy while continuing to innovate?
Toth: We talk a lot about how privacy is in Alexa’s DNA. The “microphone off” button, the physical camera shutter, and the light and audio indicators notifying customers when Alexa processes a request are all controls that customers can see, hear, and touch. While these controls are important, we believe customers should have privacy without having to take an extra step.
I’ve worked on privacy for most of my career. Privacy is often presented as a constraint and, in a way, it is. Having constraints certainly spurs creativity, but privacy has also become an opportunity for invention itself. Our science and speech teams have invested in programs to protect privacy and use data responsibly that don’t require any action from the customer.
Natarajan: Voice assistants present unique privacy challenges because there are parts of the experience that customers cannot see or hear. When we do collect and use customer data, we keep it secure and use it responsibly. For example, we use privacy-preserving methods to limit the amount and type of data that we use in our natural language understanding modeling environment when training our machine learning models. Advances such as teachable AI and on-the-fly self-learning enable users to customize their experiences and deliver ongoing performance improvements that do not require the models to be retrained.
We also continue to invest in anonymization and synthetic data generation techniques to further protect customer privacy.
Sindhwani: Our scientists and engineers invest in research and privacy-enhancing techniques to further improve Alexa speech recognition. Similar to the work Prem described, we are also developing new techniques to use synthetic data—training data generated by algorithms that mimic the real world—for improving our automatic speech recognition models. And, we’ve taken steps to rely even less on supervised learning techniques—where voice recordings are manually reviewed—through improvements in privacy-preserving techniques, like transfer learning, active learning, federated learning, and unsupervised or self-learning. Self-learning technologies learn entirely from customer interactions through implicit and explicit feedback without requiring manual labeling.
You describe privacy as an opportunity for invention, can you tell us more about how that comes to life for customers?
Toth: There’s a lot of innovation happening around privacy, especially within the Alexa organization. One core privacy principle is to always try to give customers more value while using less data, which I see as being not that different from how science has given us more computing processing power at lower cost. Do more with less. In the world of privacy, we call this data minimization. Some examples of this are moving more data processing directly onto our devices, looking for ways to de-identify data sooner, and building and refining privacy-preserving machine learning models. The team is working behind the scenes to do more with less by investing in data minimization techniques such as reducing the reliance on supervised learning.
Natarajan: We are always exploring new techniques for future implementation and investments in research, especially advances in generalizable AI methodologies. For example, we are actively leveraging large, pre-trained models built from open-source data for few-shot and zero-shot learning to reduce the need for customer data to develop deep learning models for conversational AI and related language understanding applications. We are also developing algorithms that de-identify the data used in model training and enable our models to be robust against privacy attacks. Any advancements or applications could have tremendous benefits for our customers and further protect the data we use every day.
Are there any privacy-preserving advancements you’re excited to bring to customers?
Sindhwani: We’ve continued to invest in Alexa’s on-device speech-processing capabilities. Customers in the U.S. with compatible Echo smart speakers will soon be able to enable a new setting that allows the audio of their Alexa voice requests to be processed locally on-device, without being sent to the cloud. The voice recordings are deleted after the on-device processing. To do this, we had to make Automatic Speech Recognition models—that used to be many gigabytes in size, required huge amounts of memory, and ran on massive servers in the cloud—efficient enough to work on a single device. I’m particularly excited about innovations like this that have tangible benefits for customers through reduced latency and lower bandwidth consumption, while offering more choice about what happens with their data.