使用源分离来实现音乐上有意义的解释

论文标题

使用源分离来实现音乐上有意义的解释

Towards Musically Meaningful Explanations Using Source Separation

论文作者

Haunschmid, Verena, Manilow, Ethan, Widmer, Gerhard

论文摘要

深层神经网络（DNN）成功地应用于各种音乐信息检索（MIR）任务。这样的模型通常被认为是“黑匣子”，这意味着它们的预测是不可解释的。 MIR中可解释模型的先前工作通常使用图像处理工具来为DNN预测产生解释，但是在音乐上不一定是有意义的，或者可以聆听（可以说，这在音乐中很重要）。我们提出了Audiolime，这是一种基于局部可解释的模型敏捷解释（Lime）的方法，它通过音乐的音乐定义扩展。 Lime在我们要解释的示例的扰动上学习本地线性模型。我们建议使用源分离提出，而不是使用图像分割来提取光谱图的组件。通过打开/关闭源来创建扰动，这使我们的解释可听。我们首先在分类器上验证了Audiolime，该分类器经过故意训练，以将真正的目标与虚假信号混淆，并证明可以使用我们的方法轻松检测到这一点。然后，我们证明它通过了理智检查，许多可用的解释方法失败了。最后，我们在第三方音乐标记器上演示了我们（模型无关）方法的一般适用性。

Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks. Such models are usually considered "black boxes", meaning that their predictions are not interpretable. Prior work on explainable models in MIR has generally used image processing tools to produce explanations for DNN predictions, but these are not necessarily musically meaningful, or can be listened to (which, arguably, is important in music). We propose audioLIME, a method based on Local Interpretable Model-agnostic Explanation (LIME), extended by a musical definition of locality. LIME learns locally linear models on perturbations of an example that we want to explain. Instead of extracting components of the spectrogram using image segmentation as part of the LIME pipeline, we propose using source separation. The perturbations are created by switching on/off sources which makes our explanations listenable. We first validate audioLIME on a classifier that was deliberately trained to confuse the true target with a spurious signal, and show that this can easily be detected using our method. We then show that it passes a sanity check that many available explanation methods fail. Finally, we demonstrate the general applicability of our (model-agnostic) method on a third-party music tagger.

下载PDF全文

下载文献需遵守相关版权规定

论文标题