论文标题
平行序列到序列学习的扩散扫射变压器
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
论文作者
论文摘要
以前,由于对多种目标模式进行建模的困难,非自动入学模型被广泛认为是发电效率优越,而发电质量较低。为了增强多模式建模能力,我们提出了扩散扫射变压器,该变压器采用了模态扩散过程和残留的瞥见采样。模态扩散过程是一个离散的过程,可沿解码步骤插值多模式分布,而残留的瞥见采样方法指导模型不断学习整个层之间的剩余方式。各种机器翻译和文本生成基准的实验结果表明,与自回归和非自动回归模型相比,Diffglat可以在保持快速解码速度的同时保持更好的生成精度。
Previously, non-autoregressive models were widely perceived as being superior in generation efficiency but inferior in generation quality due to the difficulties of modeling multiple target modalities. To enhance the multi-modality modeling ability, we propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling. The modality diffusion process is a discrete process that interpolates the multi-modal distribution along the decoding steps, and the residual glancing sampling approach guides the model to continuously learn the remaining modalities across the layers. Experimental results on various machine translation and text generation benchmarks demonstrate that DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.