在文件对齐中利用句子顺序

论文标题

在文件对齐中利用句子顺序

Exploiting Sentence Order in Document Alignment

论文作者

Thompson, Brian, Koehn, Philipp

论文摘要

我们提出了一种简单的文档对齐方式，该方法将句子订单信息包含在候选人生成和候选人重新评分中。与WMT16文档对齐共享任务的最佳先前发布的结果相比，我们的方法相对减少了误差的61％。我们的方法改善了Web-Scraid Sinhala上的下游MT性能 - 吞噬的英语文档，表现优于最新的吞噬版本中使用的文档对齐方式。它还胜过使用相同多语言嵌入的可比较的语料库方法，表明即使最终目标是句子级的bitext，剥削句子顺序也是有益的。

We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring. Our method results in 61% relative reduction in error compared to the best previously published result on the WMT16 document alignment shared task. Our method improves downstream MT performance on web-scraped Sinhala--English documents from ParaCrawl, outperforming the document alignment method used in the most recent ParaCrawl release. It also outperforms a comparable corpora method which uses the same multilingual embeddings, demonstrating that exploiting sentence order is beneficial even if the end goal is sentence-level bitext.

下载PDF全文

下载文献需遵守相关版权规定

论文标题