与神经音频效果的可区分混合控制台自动混合自动混合

论文标题

与神经音频效果的可区分混合控制台自动混合自动混合

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

论文作者

Steinmetz, Christian J., Pons, Jordi, Pascual, Santiago, Serrà, Joan

论文摘要

深度学习到自动多元式混合的应用很大程度上没有探索。这部分是由于可用数据有限的，再加上此类数据相对非结构化和可变的事实。为了应对这些挑战，我们提出了一个以域名启发的模型，具有强烈的电感偏差，以实现混合任务。我们通过应用预训练的子网络和权重共享以及总和/差异立体声损失函数来实现这一目标。所提出的模型可以用有限数量的示例训练，相对于输入顺序是不变的，并且对输入源的数量没有限制。此外，它会产生人类可读的混合参数，从而使用户可以手动调整或完善生成的混合物。涉及音频工程师的感知评估的结果表明，我们的方法产生的混合可能优于基线方法。据我们所知，这项工作展示了在波形级别从现实世界中学习多站混合惯例的第一种方法，而不必了解基本混合参数。

Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weight sharing, as well as with a sum/difference stereo loss function. The proposed model can be trained with a limited number of examples, is permutation invariant with respect to the input ordering, and places no limit on the number of input sources. Furthermore, it produces human-readable mixing parameters, allowing users to manually adjust or refine the generated mix. Results from a perceptual evaluation involving audio engineers indicate that our approach generates mixes that outperform baseline approaches. To the best of our knowledge, this work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题