对比度学习通用音频表示

论文标题

对比度学习通用音频表示

Contrastive Learning of General-Purpose Audio Representations

论文作者

Saeed, Aaqib, Grangier, David, Zeghidour, Neil

论文摘要

我们介绍了Cola，这是一种自我监督的预训练方法，用于学习音频的通用代表。我们的方法基于对比度学习：它学习了一种表示形式，该表示与从同一记录中提取的音频段分配了很高的相似性，同时将较低的相似性分配给不同记录中的片段。我们以对比度学习的最新进展为计算机视觉和强化学习的最新进展，设计了音频的轻巧，易于实现的自我监督模型。我们在大规模音频集数据库中预先培训嵌入，并将这些表示形式转移到9个不同的分类任务中，包括语音，音乐，动物声音和声学场景。我们表明，尽管它很简单，但我们的方法显着优于以前的自我监督系统。此外，我们进行消融研究以确定关键的设计选择并释放图书馆以预先培训和微调可乐模型。

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. Our approach is based on contrastive learning: it learns a representation which assigns high similarity to audio segments extracted from the same recording while assigning lower similarity to segments from different recordings. We build on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio. We pre-train embeddings on the large-scale Audioset database and transfer these representations to 9 diverse classification tasks, including speech, music, animal sounds, and acoustic scenes. We show that despite its simplicity, our method significantly outperforms previous self-supervised systems. We furthermore conduct ablation studies to identify key design choices and release a library to pre-train and fine-tune COLA models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题