论文标题

平行序列到序列学习的扩散扫射变压器

Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

论文作者

Qian, Lihua, Wang, Mingxuan, Liu, Yang, Zhou, Hao

论文摘要

以前,由于对多种目标模式进行建模的困难,非自动入学模型被广泛认为是发电效率优越,而发电质量较低。为了增强多模式建模能力,我们提出了扩散扫射变压器,该变压器采用了模态扩散过程和残留的瞥见采样。模态扩散过程是一个离散的过程,可沿解码步骤插值多模式分布,而残留的瞥见采样方法指导模型不断学习整个层之间的剩余方式。各种机器翻译和文本生成基准的实验结果表明,与自回归和非自动回归模型相比,Diffglat可以在保持快速解码速度的同时保持更好的生成精度。

Previously, non-autoregressive models were widely perceived as being superior in generation efficiency but inferior in generation quality due to the difficulties of modeling multiple target modalities. To enhance the multi-modality modeling ability, we propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling. The modality diffusion process is a discrete process that interpolates the multi-modal distribution along the decoding steps, and the residual glancing sampling approach guides the model to continuously learn the remaining modalities across the layers. Experimental results on various machine translation and text generation benchmarks demonstrate that DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源