论文标题

在机器翻译评估中删除不确定性

Disentangling Uncertainty in Machine Translation Evaluation

论文作者

Zerva, Chrysoula, Glushkova, Taisiya, Rei, Ricardo, Martins, André F. T.

论文摘要

机器翻译(MT)的可训练评估指标与人类判断具有很强的相关性,但是它们通常很难解释,并且可能在嘈杂或域外数据下产生不可靠的分数。最近的工作试图通过简单的不确定性量化技术(Monte Carlo辍学和深层合奏)来减轻这种情况,但是这些技术(如我们所示)在几种方面受到限制 - 例如,它们无法区分不同的不确定性,并且它们是时间和记忆消耗。在本文中,我们提出了更强大,有效的不确定性预测因子,以评估MT评估,并评估它们针对不同质地和认知不确定性来源的能力。为此,我们开发并比较了彗星指标的培训目标,以通过不确定性预测输出(包括异性差回归,差异最小化和直接的不确定性预测)来增强它。我们的实验显示了WMT指标任务数据集的不确定性预测结果的改善,并大大降低了计算成本。此外,他们证明了这些预测因子在MT评估中解决特定不确定性原因的能力,例如低质量参考和偏域数据。

Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data. Recent work has attempted to mitigate this with simple uncertainty quantification techniques (Monte Carlo dropout and deep ensembles), however these techniques (as we show) are limited in several ways -- for example, they are unable to distinguish between different kinds of uncertainty, and they are time and memory consuming. In this paper, we propose more powerful and efficient uncertainty predictors for MT evaluation, and we assess their ability to target different sources of aleatoric and epistemic uncertainty. To this end, we develop and compare training objectives for the COMET metric to enhance it with an uncertainty prediction output, including heteroscedastic regression, divergence minimization, and direct uncertainty prediction. Our experiments show improved results on uncertainty prediction for the WMT metrics task datasets, with a substantial reduction in computational costs. Moreover, they demonstrate the ability of these predictors to address specific uncertainty causes in MT evaluation, such as low quality references and out-of-domain data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源