论文标题
通过利用辅助语音和文本数据来改善端到端的语音翻译
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
论文作者
论文摘要
我们提出了一种将文本编码器引入预训练的端到端语音翻译系统中的方法。它增强了将一种模态(即源语言语音)适应另一种模式(即源语言)(即源语言文本)的能力。因此,语音翻译模型可以从未标记和标记的数据中学习,尤其是当源语言文本数据丰富时。除此之外,我们提出了一种denoising方法,可以构建一个可以同时处理正常和嘈杂的文本数据的强大文本编码器。我们的系统在必要的cen-de,en-fr和librispeech en-fr任务上设置了新的最新技术。
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.