论文标题

马特:长尾音乐流派分类的多种企业注意机制

MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre Classification

论文作者

Liu, Xiaokai, Zhang, Menghua

论文摘要

音乐流派分类不平衡是音乐信息检索(MIR)领域的至关重要的任务,用于识别基于相关音乐音频段的长尾,贫乏的类型,这在实际情况下非常普遍。大多数现有的模型都是为级别平衡的音乐数据集而设计的,在识别发行尾部的音乐流派时,准确性和泛化的性能差。受到在各种分类任务中引入多实体学习(MIL)的成功的启发,我们提出了一种名为Multi-Instance注意(Matt)的新型机制,以提高识别尾巴类别的性能。具体来说,我们首先通过生成专辑 - 艺术家配对袋来构建行李级数据集。其次,我们利用神经网络编码音乐音频段。最后,在多构度注意机制的指导下,基于神经网络的模型可以选择最有用的类型以匹配给定的音乐段。关于具有长尾分布的大规模音乐类型基准数据集的全面实验结果表明,马特的表现明显优于其他最先进的基线。

Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field for identifying the long-tail, data-poor genre based on the related music audio segments, which is very prevalent in real-world scenarios. Most of the existing models are designed for class-balanced music datasets, resulting in poor performance in accuracy and generalization when identifying the music genres at the tail of the distribution. Inspired by the success of introducing Multi-instance Learning (MIL) in various classification tasks, we propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes. Specifically, we first construct the bag-level datasets by generating the album-artist pair bags. Second, we leverage neural networks to encode the music audio segments. Finally, under the guidance of a multi-instance attention mechanism, the neural network-based models could select the most informative genre to match the given music segment. Comprehensive experimental results on a large-scale music genre benchmark dataset with long-tail distribution demonstrate MATT significantly outperforms other state-of-the-art baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源