建模神经机器翻译的未来成本

论文标题

建模神经机器翻译的未来成本

Modeling Future Cost for Neural Machine Translation

论文作者

Duan, Chaoqun, Chen, Kehai, Wang, Rui, Utiyama, Masao, Sumita, Eiichiro, Zhu, Conghui, Zhao, Tiejun

论文摘要

现有的神经机器翻译（NMT）系统利用序列到序列的神经网络通过单词生成目标翻译，然后在每个时间段和参考文献中的每个时间段中制作生成的单词，尽可能一致。但是，训练有素的翻译模型倾向于专注于确保在当前时间阶段生成的目标词的准确性，并且不考虑其未来成本，这意味着产生后续目标翻译的预期成本（即下一个目标词）。 To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems.详细说明，根据当前生成的目标词及其上下文信息来估算时间相关的未来成本，以增强NMT模型的培训。此外，在当前时间步骤中学习的未来上下文表示形式用于帮助解码中的下一个目标单词的生成。三个广泛使用的翻译数据集的实验结果，包括WMT14德语到英语，WMT14英语至法语，以及WMT17中文到英语，表明所提出的方法可以对基于强大变压器的NMT基线进行重大改进。

Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does not consider its future cost which means the expected cost of generating the subsequent target translation (i.e., the next target word). To respond to this issue, we propose a simple and effective method to model the future cost of each target word for NMT systems. In detail, a time-dependent future cost is estimated based on the current generated target word and its contextual information to boost the training of the NMT model. Furthermore, the learned future context representation at the current time-step is used to help the generation of the next target word in the decoding. Experimental results on three widely-used translation datasets, including the WMT14 German-to-English, WMT14 English-to-French, and WMT17 Chinese-to-English, show that the proposed approach achieves significant improvements over strong Transformer-based NMT baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题