使用生成对抗网络比较音频合成的表示

论文标题

使用生成对抗网络比较音频合成的表示

Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

论文作者

Nistal, Javier, Lattner, Stefan, Richard, Gaël

论文摘要

在本文中，我们将不同的音频信号表示形式（包括原始音频波形和各种时频表示）与生成对抗性网络（GAN）的任务进行比较。我们在Nsynth数据集的子集上进行实验。该体系结构是基准进步的生长瓦斯泰因·甘（Waserstein Gan）。我们以完全非条件的方式进行实验，并在音调信息上调节网络。我们对使用标准指标评估生成模型的标准指标进行定量评估生成的材料，并比较培训和抽样时间。我们表明，短期傅立叶变换的复杂值以及幅度和瞬时频率达到了最佳结果，并产生快速的生成和反转时间。可以在线获得功能提取，培训和评估模型的代码。

In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a fully non-conditional manner as well as conditioning the network on the pitch information. We quantitatively evaluate the generated material utilizing standard metrics for assessing generative models, and compare training and sampling times. We show that complex-valued as well as the magnitude and Instantaneous Frequency of the Short-Time Fourier Transform achieve the best results, and yield fast generation and inversion times. The code for feature extraction, training and evaluating the model is available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题