通过利用辅助语音和文本数据来改善端到端的语音翻译

论文标题

通过利用辅助语音和文本数据来改善端到端的语音翻译

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

论文作者

Zhang, Yuhao, Xu, Chen, Hu, Bojie, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo

论文摘要

我们提出了一种将文本编码器引入预训练的端到端语音翻译系统中的方法。它增强了将一种模态（即源语言语音）适应另一种模式（即源语言）（即源语言文本）的能力。因此，语音翻译模型可以从未标记和标记的数据中学习，尤其是当源语言文本数据丰富时。除此之外，我们提出了一种denoising方法，可以构建一个可以同时处理正常和嘈杂的文本数据的强大文本编码器。我们的系统在必要的cen-de，en-fr和librispeech en-fr任务上设置了新的最新技术。

We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题