论文标题

机器翻译的解码和多样性

Decoding and Diversity in Machine Translation

论文作者

Roberts, Nicholas, Liang, Davis, Neubig, Graham, Lipton, Zachary C.

论文摘要

通常使用自动化指标评估神经机器翻译(NMT)系统,以评估生成的翻译和地面真相候选者之间的一致性。为了改善这些指标的系统,NLP研究人员采用了各种启发式技术,包括搜索条件模式(与采样)并结合了各种训练启发式启示术(例如,标签平滑)。尽管搜索策略显着提高了BLEU得分,但它们产生的确定性产出缺乏人类翻译的多样性。此外,搜索倾向于偏向翻译的性别代词的分布。这使得人级BLEU成为误导性的基准,因为现代MT系统无法接近人级的BLEU,而同时保持人类水平的翻译多样性。在本文中,我们表征了生成和真实翻译之间的分布差异,研究了NMT所享受的BLEU分数所支付的多样性成本。此外,我们的研究暗示搜索是翻译性别代词时已知偏见的显着来源。

Neural Machine Translation (NMT) systems are typically evaluated using automated metrics that assess the agreement between generated translations and ground truth candidates. To improve systems with respect to these metrics, NLP researchers employ a variety of heuristic techniques, including searching for the conditional mode (vs. sampling) and incorporating various training heuristics (e.g., label smoothing). While search strategies significantly improve BLEU score, they yield deterministic outputs that lack the diversity of human translations. Moreover, search tends to bias the distribution of translated gender pronouns. This makes human-level BLEU a misleading benchmark in that modern MT systems cannot approach human-level BLEU while simultaneously maintaining human-level translation diversity. In this paper, we characterize distributional differences between generated and real translations, examining the cost in diversity paid for the BLEU scores enjoyed by NMT. Moreover, our study implicates search as a salient source of known bias when translating gender pronouns.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源