利用伪标记的数据来改善直接语音到语音翻译

论文标题

利用伪标记的数据来改善直接语音到语音翻译

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

论文作者

Dong, Qianqian, Yue, Fengpeng, Ko, Tom, Wang, Mingxuan, Bai, Qibing, Zhang, Yu

论文摘要

直接语音到语音翻译（S2ST）最近引起了越来越多的关注。由于数据稀缺和复杂的语音到语音映射，该任务非常具有挑战性。在本文中，我们报告了我们在S2ST中的最新成就。首先，我们构建了一个S2ST变压器基线，该基线的表现优于原始翻译。其次，我们通过伪标记来利用外部数据，并在Fisher英语对西班牙测试集中获得新的最新结果。实际上，我们使用流行技术的组合来利用伪数据，这些技术在应用于S2ST时并非微不足道。此外，我们在句法相似（西班牙语）和遥远（英语）语言对上评估了我们的方法。我们的实施可从https://github.com/fengpeng-yue/speech-speech-translation获得。

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题