不可接受，我的隐私在哪里？探索智能扬声器的意外触发器

论文标题

不可接受，我的隐私在哪里？探索智能扬声器的意外触发器

Unacceptable, where is my privacy? Exploring Accidental Triggers of Smart Speakers

论文作者

Schönherr, Lea, Golla, Maximilian, Eisenhofer, Thorsten, Wiele, Jan, Kolossa, Dorothea, Holz, Thorsten

论文摘要

亚马逊的Alexa，Google的助手或Apple的Siri等语音助手已成为智能扬声器的主要（语音）接口，可以在数百万个家庭中找到。出于隐私原因，这些发言者将环境中的每个声音分析各自的唤醒词，例如“ Alexa”或“ Hey Siri”，然后将音频流上传到云中以进行进一步处理。先前的工作报告了不准确的唤醒单词检测，可以使用类似的单词或听起来像“可卡因面条”，而不是“ OK Google”。在本文中，我们对这种意外触发器进行了全面分析，即，e。，声音不应该触发语音助手，但确实如此。更具体地说，我们使用日常媒体（例如电视节目，新闻和其他类型的音频数据集）来自动寻找意外触发器并测量来自8个不同制造商的11个智能扬声器的流行。为了系统地检测意外触发器，我们描述了一种使用发音词典和基于电话的Levenshtein距离进行人工制作此类触发器的方法。总的来说，我们发现了数百个意外触发器。此外，我们探讨了潜在的性别和语言偏见并分析可重复性。最后，我们讨论了意外触发器的产生隐私影响，并探索对策，以减少和限制其对用户隐私的影响。为了促进对误导机器学习模型的这些声音进行的其他研究，我们发布了1000多个经过验证的触发器的数据集作为研究工具。

Voice assistants like Amazon's Alexa, Google's Assistant, or Apple's Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like ''Alexa'' or ''Hey Siri,'' before uploading the audio stream to the cloud for further processing. Previous work reported on the inaccurate wake word detection, which can be tricked using similar words or sounds like ''cocaine noodles'' instead of ''OK Google.'' In this paper, we perform a comprehensive analysis of such accidental triggers, i.,e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday media such as TV shows, news, and other kinds of audio datasets. To systematically detect accidental triggers, we describe a method to artificially craft such triggers using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In total, we have found hundreds of accidental triggers. Moreover, we explore potential gender and language biases and analyze the reproducibility. Finally, we discuss the resulting privacy implications of accidental triggers and explore countermeasures to reduce and limit their impact on users' privacy. To foster additional research on these sounds that mislead machine learning models, we publish a dataset of more than 1000 verified triggers as a research artifact.

下载PDF全文

下载文献需遵守相关版权规定

论文标题