论文标题
差异私人扬声器匿名化
Differentially Private Speaker Anonymization
论文作者
论文摘要
共享现实世界的语音话语是培训和部署基于语音的服务的关键。但是,由于语音包含大量个人数据,因此也会增加隐私风险。演讲者的匿名化旨在从语音发言中删除说话者信息,同时使其语言和韵律属性完好无损。最先进的技术是通过从这些属性中删除说话者信息(通过说话者嵌入的说话者)和基于另一个说话者的扬声器的重新合成语音来运行的。隐私社区的先前研究表明,匿名化通常提供脆弱的隐私保护,甚至更少,因此任何可证明的保证。在这项工作中,我们表明,解开确实并不完美:语言和韵律属性仍然包含扬声器信息。我们通过基于自动编码器和自动语音识别器引入差异性私有功能提取器,从这些属性中删除说话者信息,分别是使用噪声层训练的。我们将这些提取器插入最先进的匿名管道中,并首次生成私人语音发言,并在其包含的扬声器信息上具有可证明的上限。我们从librispeech数据集上通过差异私人扬声器匿名方法产生的隐私和实用性评估。实验结果表明,产生的话语保留了非常高的效用,以自动语音识别训练和推断,同时可以更好地保护强大的对手,这些对手利用匿名过程的全部知识来试图推断说话者的身份。
Sharing real-world speech utterances is key to the training and deployment of voice-based services. However, it also raises privacy risks as speech contains a wealth of personal data. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. State-of-the-art techniques operate by disentangling the speaker information (represented via a speaker embedding) from these attributes and re-synthesizing speech based on the speaker embedding of another speaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this work, we show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers. We plug these extractors in the state-of-the-art anonymization pipeline and generate, for the first time, private speech utterances with a provable upper bound on the speaker information they contain. We evaluate empirically the privacy and utility resulting from our differentially private speaker anonymization approach on the LibriSpeech data set. Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.
