论文标题
多任务学习,用于可解释的弱标记的声音事件检测
Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection
论文作者
论文摘要
近年来,由于其扩展声音事件检测(SED)的潜力,并被表达为多个实例学习(MIL)问题,因此近年来弱标记的学习引起了很多关注。本文提出了一个多任务学习(MTL)框架,用于从涉及传统MIL设置的弱标记的音频数据中学习。为了显示提出的框架的实用性,我们将输入时间频率表示(T-F)重建作为辅助任务。我们表明,所选的辅助任务可以通过内部的T-F表示形式来提高内部T-F表示,并在嘈杂的录音下提高了SED性能。我们的第二个贡献是引入两个步骤的集合机制。通过在注意机制上具有2个步骤,网络保留了更好的T-F水平信息,而不会损害SED性能。第一步和第二步注意权重的可视化有助于将音频事件定位在T-F域中。为了评估所提出的框架,我们将Dcase 2019任务1与DCASE 2018任务2的声学场景数据混音2在0、10和20 dB snr下的声音事件数据,导致多级弱标记的SED问题。拟议的总框架的表现优于所有SNR的现有基准模型,尤其是在0、10和20 dB SNR上的22.3%,12.8%,5.9%的改善。我们进行消融研究,以确定每项辅助任务的贡献和2步关注对SED性能改善的贡献。该代码已公开发布
Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem. This paper proposes a Multi-Task Learning (MTL) framework for learning from Weakly Labelled Audio data which encompasses the traditional MIL setup. To show the utility of proposed framework, we use the input TimeFrequency representation (T-F) reconstruction as the auxiliary task. We show that the chosen auxiliary task de-noises internal T-F representation and improves SED performance under noisy recordings. Our second contribution is introducing two step Attention Pooling mechanism. By having 2-steps in attention mechanism, the network retains better T-F level information without compromising SED performance. The visualisation of first step and second step attention weights helps in localising the audio-event in T-F domain. For evaluating the proposed framework, we remix the DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data under 0, 10 and 20 db SNR resulting in a multi-class Weakly labelled SED problem. The proposed total framework outperforms existing benchmark models over all SNRs, specifically 22.3 %, 12.8 %, 5.9 % improvement over benchmark model on 0, 10 and 20 dB SNR respectively. We carry out ablation study to determine the contribution of each auxiliary task and 2-step Attention Pooling to the SED performance improvement. The code is publicly released
