多语言混合：示例插值改善了多语言神经机器翻译

论文标题

多语言混合：示例插值改善了多语言神经机器翻译

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

论文作者

Cheng, Yong, Bapna, Ankur, Firat, Orhan, Cao, Yuan, Wang, Pidong, Macherey, Wolfgang

论文摘要

对多语言神经机器翻译模型进行了训练，以最大程度地提高从多种语言对绘制的示例的可能性。适用于这些模型的主要归纳偏见是共享词汇和跨语言的共享参数集。与不同语言对示例相对应的输入和标签仍可能存在于不同的子空间中。在本文中，我们将多语言交叉编码器（MXENCDEC）介绍以在实例级别融合语言对。我们的方法将不同语言对的实例插入了联合“交叉示例”，以鼓励跨语言共享输入和输出空间。为了确保在多语言环境中更好地融合示例，我们提出了几种技术，以改善在大量数据不平衡下跨语言插值的示例。大规模WMT多语言数据集进行的实验表明，我们的方法显着提高了英语对许多，多到英国和零摄像的翻译任务的质量（从+0.5 BLEU到+5.5 bleu点）。代码切换集的结果证明了我们方法可以改善模型概括到分布多语言示例的能力。我们还进行定性和定量表示比较，以分析表示级别的方法的优势。

Multilingual neural machine translation models are trained to maximize the likelihood of a mix of examples drawn from multiple language pairs. The dominant inductive bias applied to these models is a shared vocabulary and a shared set of parameters across languages; the inputs and labels corresponding to examples drawn from different language pairs might still reside in distinct sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint `crossover examples' in order to encourage sharing input and output spaces across languages. To ensure better fusion of examples in multilingual settings, we propose several techniques to improve example interpolation across dissimilar languages under heavy data imbalance. Experiments on a large-scale WMT multilingual dataset demonstrate that our approach significantly improves quality on English-to-Many, Many-to-English and zero-shot translation tasks (from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets demonstrate the capability of our approach to improve model generalization to out-of-distribution multilingual examples. We also conduct qualitative and quantitative representation comparisons to analyze the advantages of our approach at the representation level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题