具有深网的音乐中的音频缺陷检测

论文标题

具有深网的音乐中的音频缺陷检测

Audio Defect Detection in Music with Deep Networks

论文作者

Wolff, Daniel, Mignot, Rémi, Roebel, Axel

论文摘要

随着越来越多的音乐被数字化从生产转移到分销，需要自动确定媒体质量。数字音频处理工具中的保护机制并未消除分布链下游的生产实体的需求，以评估音频质量并检测到上游插入的缺陷。这种分析通常仅依赖于收到的音频和稀缺的元数据。故意使用诸如流行音乐的点击以及因现代音频编码腐败而产生的较新的缺陷之类的人工制品，要求以数据为中心和上下文敏感的解决方案进行检测。我们提出了以下端到端编码器解码器配置的卷积网络体系结构，以开发两个示例性音频缺陷的检测器。训练了点击检测器，并将其与传统的信号处理方法进行了比较，并讨论了上下文敏感性。附加后处理用于数据增强和工作流仿真。在检测器中探索了我们模型捕获差异的能力，以减少损坏的MP3压缩音频减压。对于这两个任务，我们描述了用于受控检测器训练和评估的人工制品的合成生成。我们在大型开源免费音乐档案（FMA）和特定于类型的数据集上评估探测器。

With increasing amounts of music being digitally transferred from production to distribution, automatic means of determining media quality are needed. Protection mechanisms in digital audio processing tools have not eliminated the need of production entities located downstream the distribution chain to assess audio quality and detect defects inserted further upstream. Such analysis often relies on the received audio and scarce meta-data alone. Deliberate use of artefacts such as clicks in popular music as well as more recent defects stemming from corruption in modern audio encodings call for data-centric and context sensitive solutions for detection. We present a convolutional network architecture following end-to-end encoder decoder configuration to develop detectors for two exemplary audio defects. A click detector is trained and compared to a traditional signal processing method, with a discussion on context sensitivity. Additional post-processing is used for data augmentation and workflow simulation. The ability of our models to capture variance is explored in a detector for artefacts from decompression of corrupted MP3 compressed audio. For both tasks we describe the synthetic generation of artefacts for controlled detector training and evaluation. We evaluate our detectors on the large open-source Free Music Archive (FMA) and genre-specific datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题