通过交叉滤波标签标记音频标签

论文标题

通过交叉滤波标签标记音频标签

Audio Tagging by Cross Filtering Noisy Labels

论文作者

Zhu, Boqing, Xu, Kele, Kong, Qiuqiang, Wang, Huaimin, Peng, Yuxing

论文摘要

高质量标记的数据集允许深入学习，以在许多声音分析任务上取得令人印象深刻的结果。但是，准确注释大量音频数据是劳动力密集的，并且数据集可能在实际设置中包含嘈杂的标签。同时，由于其出色的记忆能力，深层神经网络对那些标记数据的不正确数据感到不安。在本文中，我们提出了一个名为CrossFilter的新颖框架，以解决音频标签的嘈杂标签问题。多个表示（例如LogMel和MFCC）用作我们框架的输入，以提供更多的音频互补信息。然后，尽管两个神经网络的合作和相互作用，我们通过从嘈杂的数据中逐步挑选可能正确标记的数据来将数据集分为策划和嘈杂的子集。此外，我们的方法利用具有不同损失功能的策划和嘈杂子集的多任务学习来充分利用整个数据集。嘈杂的损失函数用于减轻不正确标签的不利影响。在音频标签数据集FSDKAGGLE2018和FSDKAGGLE2019上，经验结果表明，与其他竞争方法相比，性能提高。在FSDKaggle2018数据集上，我们的方法实现了最先进的性能，甚至超过了集合模型。

High quality labeled datasets have allowed deep learning to achieve impressive results on many sound analysis tasks. Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings. Meanwhile, the deep neural networks are susceptive to those incorrect labeled data because of their outstanding memorization ability. In this paper, we present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging. Multiple representations (such as, Logmel and MFCC) are used as the input of our framework for providing more complementary information of the audio. Then, though the cooperation and interaction of two neural networks, we divide the dataset into curated and noisy subsets by incrementally pick out the possibly correctly labeled data from the noisy data. Moreover, our approach leverages the multi-task learning on curated and noisy subsets with different loss function to fully utilize the entire dataset. The noisy-robust loss function is employed to alleviate the adverse effects of incorrect labels. On both the audio tagging datasets FSDKaggle2018 and FSDKaggle2019, empirical results demonstrate the performance improvement compared with other competing approaches. On FSDKaggle2018 dataset, our method achieves state-of-the-art performance and even surpasses the ensemble models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题