论文标题
基于卷积复发性神经网络的TATUM级鼓转录和基于语言模型的正规化训练
Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training
论文作者
论文摘要
本文描述了一种神经鼓转录方法,该方法从音乐中检测到以$ \ textit {tatum} $级别的鼓开始,其中假定tatum时间是预先估算的。在鼓声转录的传统研究中,深度神经网络(DNN)经常被用来将音乐谱图作为输入并估算$ \ textit {frame} $ evelp的鼓的发作时间。但是,这种框架到框架DNN的主要问题是,估计的发作时间通常不符合象征性鼓得分中出现的典型tatum级图案,因为这些模式的长期音乐上有意义的结构很难在框架级别上学习。为了解决这个问题,我们为框架到泰国DNN提出了一种正规培训方法。在提出的方法中,从广泛的鼓分数集合中训练了Tatum级概率语言模型(门控复发单元(GRU)网络或重复感知的BI-GRAM模型)。鉴于可以通过语言模型来评估塔图姆级发作时间的音乐自然性,因此,框架到塔特型DNN经过基于预审前语言模型的常规器训练。实验结果证明了拟议的正规训练方法的有效性。
This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the $\textit{frame}$ level. The major problem with such frame-to-frame DNNs, however, is that the estimated onset times do not often conform with the typical tatum-level patterns appearing in symbolic drum scores because the long-term musically meaningful structures of those patterns are difficult to learn at the frame level. To solve this problem, we propose a regularized training method for a frame-to-tatum DNN. In the proposed method, a tatum-level probabilistic language model (gated recurrent unit (GRU) network or repetition-aware bi-gram model) is trained from an extensive collection of drum scores. Given that the musical naturalness of tatum-level onset times can be evaluated by the language model, the frame-to-tatum DNN is trained with a regularizer based on the pretrained language model. The experimental results demonstrate the effectiveness of the proposed regularized training method.