论文标题
当频域的差异得到补偿时:理解和击败自动语音识别的调制重播攻击
When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition
论文作者
论文摘要
自动语音识别(ASR)系统已被广泛部署在现代智能设备中,以提供方便,多样化的语音控制服务。由于ASR系统容易受到可能欺骗和误导ASR系统的音频重播攻击,因此已经提出了许多防御系统,以根据频域中的扬声器独特的声学特征来识别重播音频信号。在本文中,我们发现了一种新型的重播攻击,称为调制重播攻击,该攻击可以绕过现有的基于频域的防御系统。基本思想是使用根据扬声器的转换特性定制的反过滤器来补偿给定电子扬声器的频率失真。我们在实际智能设备上进行的实验确认调制的重播攻击可以成功逃脱依赖于识别频域中可疑特征的现有检测机制。为了击败调制的重播攻击,我们设计并实施了名为Dualguard的对策。我们发现并正式证明,无论如何调制重播音频信号,重播攻击都会在时域中留下响起的伪影,或者在频域中引起频谱失真。因此,通过共同检查频率和时间域中的可疑特征,双守护人可以成功检测到包括调制重播攻击在内的各种重播攻击。我们在流行的Vodic Interactive平台Respeaker Core V2上实现了双卫音的原型。实验结果表明,双守护人可以在检测调制的重播攻击方面达到98%的精度。
Automatic speech recognition (ASR) systems have been widely deployed in modern smart devices to provide convenient and diverse voice-controlled services. Since ASR systems are vulnerable to audio replay attacks that can spoof and mislead ASR systems, a number of defense systems have been proposed to identify replayed audio signals based on the speakers' unique acoustic features in the frequency domain. In this paper, we uncover a new type of replay attack called modulated replay attack, which can bypass the existing frequency domain based defense systems. The basic idea is to compensate for the frequency distortion of a given electronic speaker using an inverse filter that is customized to the speaker's transform characteristics. Our experiments on real smart devices confirm the modulated replay attacks can successfully escape the existing detection mechanisms that rely on identifying suspicious features in the frequency domain. To defeat modulated replay attacks, we design and implement a countermeasure named DualGuard. We discover and formally prove that no matter how the replay audio signals could be modulated, the replay attacks will either leave ringing artifacts in the time domain or cause spectrum distortion in the frequency domain. Therefore, by jointly checking suspicious features in both frequency and time domains, DualGuard can successfully detect various replay attacks including the modulated replay attacks. We implement a prototype of DualGuard on a popular voice interactive platform, ReSpeaker Core v2. The experimental results show DualGuard can achieve 98% accuracy on detecting modulated replay attacks.