一种计算Bertscore的新方法，用于自动评估翻译质量

论文标题

一种计算Bertscore的新方法，用于自动评估翻译质量

A new approach to calculating BERTScore for automatic assessment of translation quality

论文作者

Vetrov, A. A., Gorn, E. A.

论文摘要

对BERTSCORE度量的适用性进行了研究，以对英语 - >俄罗斯方向的句子级别进行翻译质量评估。实验是使用预先训练的多语言BERT以及一对单语BERT模型进行的。为了使单语嵌入对齐，使用了基于锚点令牌的正交转换。已经证明，这种转换有助于防止问题不匹配，并证明该方法比使用多语言模型的嵌入提供了更好的结果。为了改善令牌匹配过程，建议将所有不完整的工件令牌组合到有意义的单词中，并使用相应的向量的简单平均，并仅基于锚定令牌来计算BertScore。这种修改使我们能够更好地将模型预测与人类判断的相关性。除了评估机器翻译外，还评估了几种版本的人类翻译版本，列出了这种方法的问题。

The study of the applicability of the BERTScore metric was conducted to translation quality assessment at the sentence level for English -> Russian direction. Experiments were performed with a pre-trained Multilingual BERT as well as with a pair of Monolingual BERT models. To align monolingual embeddings, an orthogonal transformation based on anchor tokens was used. It was demonstrated that such transformation helps to prevent mismatching issue and shown that this approach gives better results than using embeddings of the Multilingual model. To improve the token matching process it is proposed to combine all incomplete WorkPiece tokens into meaningful words and use simple averaging of corresponding vectors and to calculate BERTScore based on anchor tokens only. Such modifications allowed us to achieve a better correlation of the model predictions with human judgments. In addition to evaluating machine translation, several versions of human translation were evaluated as well, the problems of this approach were listed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题