论文标题
伪造的语音检测和零先验知识
Faked Speech Detection with Zero Prior Knowledge
论文作者
论文摘要
音频是人类交流最常用的方式之一,但与此同时,很容易被滥用来欺骗人们。随着AI的革命,几乎每个人都可以访问相关技术,从而使罪犯犯罪和伪造变得简单。在这项工作中,我们引入了一种神经网络方法,以开发一个分类器,该分类器将盲目地将输入音频分类为真实或模仿。 “盲目”一词是指在没有参考或真实来源的情况下检测模仿音频的能力。我们提出了一个深层神经网络,遵循一个顺序模型,该模型包括三个隐藏的图层,并具有交替的致密和掉落层。提出的模型接受了从大型音频数据集中提取的一组26个重要功能的培训,以获取一个分类器,该分类器在不同音频的相同功能上进行了测试。数据是从两个RAW数据集中提取的,尤其是为这项工作组成。所有英语数据集和混合数据集(阿拉伯语加英语)(可以通过将电子邮件写给第一作者,以原始形式提供数据集)。为了进行比较,音频还通过人类的检查进行了分类,主题是母语人士。随之而来的结果很有趣,并且表现出了强大的准确性,因为我们能够至少对测试用例的正确分类94%,而在人类观察者的情况下,精度为85%。
Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone, thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers. The proposed model was trained on a set of 26 important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English) (The dataset can be provided, in raw form, by writing an email to the first author). For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy, as we were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.