通用音频模型的集成参数调整

论文标题

通用音频模型的集成参数调整

Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

论文作者

Kim, Ju-ho, Heo, Jungwoo, Shin, Hyun-seo, Lim, Chan-yeong, Yu, Ha-Jin

论文摘要

超规模和通用预训练的模型的出现正在改变目标任务的构建特定于任务模型的范例。在音频研究领域，具有高传递性和适应性的任务不合时宜的预训练模型通过微调下游任务实现了最先进的性能。然而，重新训练这些大型模型的所有参数都需要大量的时间和成本以及巨大的碳足迹。为了克服这些局限性，本研究探索并应用了音频域中的有效传输学习方法。我们还通过汇总嵌入提示（基于及时的学习方法）和适配器（一种有效的传输学习方法）来提出一个集成参数效率调整（IPET）框架。我们使用两个具有不同特征的骨架预训练的音频模型来证明所提出的框架的功效：音频谱图变压器和WAV2VEC 2.0。与微调方法相比，提出的IPET框架在四个下游任务中具有更少的可训练参数相比表现出色：声音事件分类，音乐流派分类，关键字发现和扬声器验证。此外，作者识别和分析了IPET框架的缺点，为音频域中的参数有效调整提供了课程和研究方向。

The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive models entails an enormous amount of time and cost, along with a huge carbon footprint. To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain. We also propose an integrated parameter-efficient tuning (IPET) framework by aggregating the embedding prompt (a prompt-based learning approach), and the adapter (an effective transfer learning method). We demonstrate the efficacy of the proposed framework using two backbone pre-trained audio models with different characteristics: the audio spectrogram transformer and wav2vec 2.0. The proposed IPET framework exhibits remarkable performance compared to fine-tuning method with fewer trainable parameters in four downstream tasks: sound event classification, music genre classification, keyword spotting, and speaker verification. Furthermore, the authors identify and analyze the shortcomings of the IPET framework, providing lessons and research directions for parameter efficient tuning in the audio domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题