带有Neumann-cayley转换的正交封闭式复发单元

论文标题

带有Neumann-cayley转换的正交封闭式复发单元

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

论文作者

Mucllari, Edison, Zadorozhnyy, Vasily, Pospisil, Cole, Nguyen, Duc, Ye, Qiang

论文摘要

近年来，使用正交矩阵已被证明是通过训练，稳定性和收敛尤其是控制梯度来改善复发性神经网络（RNN）的一种有希望的方法。通过使用各种门和记忆单元，封闭的复发单元（GRU）和长期短期内存（LSTM）架构解决了消失的梯度问题，但它们仍然容易遭受爆炸的梯度问题。在这项工作中，我们分析了GRU中的梯度，并提出了正交矩阵的用法，以防止梯度问题爆炸并增强长期记忆。我们研究了在哪里使用正交矩阵，并提出了基于Neumann系列的缩放尺度的Cayley转换，以训练GRU中的正交矩阵，我们称之为Neumann-cayley Orthoponal Gru，或者简单地称为NC-Gru。我们介绍了有关几个合成和现实世界任务的模型的详细实验，这些实验表明NC-GRU明显优于GRU以及其他几个RNN。

In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题