论文标题
多语言同时语音翻译
Multilingual Simultaneous Speech Translation
论文作者
论文摘要
为会议或会议等活动期间同时讲话翻译而设计的应用需要平衡质量和滞后,同时显示翻译的文本以提供良好的用户体验。构建在线语言翻译系统的一种常见方法是利用用于离线语音翻译的模型。根据适应端到端单语模型的技术,我们研究了执行在线语音翻译的能力,研究了多语言模型和不同的体系结构(端到端和级联)。在多语言TEDX语料库中,我们表明该方法概括为不同的体系结构。我们看到语言和体系结构的潜伏期减少(相对40%)的收益类似。但是,端到端的体系结构在适应在线模型后会导致较小的翻译质量损失。此外,该方法甚至会缩放到零拍的方向。
Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models, we investigate multilingual models and different architectures (end-to-end and cascade) on the ability to perform online speech translation. On the multilingual TEDx corpus, we show that the approach generalizes to different architectures. We see similar gains in latency reduction (40% relative) across languages and architectures. However, the end-to-end architecture leads to smaller translation quality losses after adapting to the online model. Furthermore, the approach even scales to zero-shot directions.