论文标题
利用伪标记的数据来改善直接语音到语音翻译
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
论文作者
论文摘要
直接语音到语音翻译(S2ST)最近引起了越来越多的关注。由于数据稀缺和复杂的语音到语音映射,该任务非常具有挑战性。在本文中,我们报告了我们在S2ST中的最新成就。首先,我们构建了一个S2ST变压器基线,该基线的表现优于原始翻译。其次,我们通过伪标记来利用外部数据,并在Fisher英语对西班牙测试集中获得新的最新结果。实际上,我们使用流行技术的组合来利用伪数据,这些技术在应用于S2ST时并非微不足道。此外,我们在句法相似(西班牙语)和遥远(英语)语言对上评估了我们的方法。我们的实施可从https://github.com/fengpeng-yue/speech-speech-translation获得。
Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.