分析神经机器翻译中预测的源和目标贡献

论文标题

分析神经机器翻译中预测的源和目标贡献

Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation

论文作者

Voita, Elena, Sennrich, Rico, Titov, Ivan

论文摘要

在神经机器的翻译（以及更一般的条件语言建模）中，目标令牌的产生受两种类型的上下文的影响：目标序列的源和前缀。尽管已经进行了许多了解NMT模型的内部工作的尝试，但它们都没有明确评估对生成决策的相对源和目标贡献。我们认为，可以通过采用图层相关性传播（LRP）来评估这种相对贡献。它的基本“保护原则”使相关性传播独特：与其他方法不同，它评估不是反映令牌重要性的抽象数量，而是每个令牌影响的比例。我们将LRP扩展到变压器，并对NMT模型进行分析，该模型明确评估了对生成过程的源和目标相对贡献。当对不同类型的前缀，改变培训目标或培训数据的数量以及在培训过程中，我们分析这些贡献的变化。我们发现，受过更多数据培训的模型倾向于更多地依靠源信息，并具有更尖锐的令牌贡献。训练过程是非单调的，具有不同性质的几个阶段。

In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题