论文标题
扫视变压器,用于非自动回归神经机器翻译
Glancing Transformer for Non-Autoregressive Neural Machine Translation
论文作者
论文摘要
非自动入学神经机器翻译(NAT)的最新工作旨在通过平行解码而不牺牲质量来提高效率。但是,现有的NAT方法要么不到变压器,要么需要多个解码通行证,从而减少了加速。我们提出了Glancing语言模型(GLM),这是一种学习单通行平行生成模型的单词相互依赖性的方法。使用GLM,我们开发了用于机器翻译的Glancing Transformer(GLAT)。只有单通行平行解码,Glat能够以8-15倍的速度生成高质量的翻译。多个WMT语言方向上的实验表明,GLAT的表现优于所有先前的单个通过非自动回归方法,并且几乎可以与变压器相提并论,从而将差距降低到0.25-0.9 BLEU点。
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM), a method to learn word interdependency for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.