论文标题

利用神经查询翻译为跨语言信息检索

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

论文作者

Yao, Liang, Yang, Baosong, Zhang, Haibo, Luo, Weihua, Chen, Boxing

论文摘要

作为在跨语言信息检索(CLIR)中的关键作用,查询翻译有三个主要挑战:1)翻译的充分性; 2)缺乏内域并行训练数据; 3)潜伏期低的必要条件。为此,现有的CLIR系统主要利用基于统计的机器翻译(SMT),而不是高级神经机器翻译(NMT),从而限制了翻译和检索质量的进一步改进。在本文中,我们调查了如何将神经查询翻译模型利用为CLIR系统。具体来说,我们提出了一种新型的数据增强方法,该方法根据用户点击数据提取查询翻译对,从而减轻NMT中域适应的问题。然后,我们引入了一种异步策略,能够利用SMT中实时的优势和NMT的准确性。实验结果表明,所提出的方法比强基础可产生更好的检索质量,并且可以很好地应用于现实世界中的CLIR系统,即Aliexpress电子商务搜索引擎。读者可以在我们的网站上检查和测试他们的案例:https://aliexpress.com。

As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency. To this end, existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT), limiting the further improvements on both translation and retrieval quality. In this paper, we investigate how to exploit neural query translation model into CLIR system. Specifically, we propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data, thus to alleviate the problem of domain-adaptation in NMT. Then, we introduce an asynchronous strategy which is able to leverage the advantages of the real-time in SMT and the veracity in NMT. Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines and can be well applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search engine. Readers can examine and test their cases on our website: https://aliexpress.com .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源