直接语音到语音翻译，而无需使用瓶颈功能的文字注释

论文标题

直接语音到语音翻译，而无需使用瓶颈功能的文字注释

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

论文作者

Zhang, Junhui, Pan, Junjie, Yin, Xiang, Ma, Zejun

论文摘要

语音到语音的翻译直接将语音讲述转化为不同语言之间的另一种语音，并且在同时解释等任务中具有巨大的潜力。最先进的模型通常包含用于音素序列预测的辅助模块，这需要训练数据集的文本注释。我们提出了一个直接语音到语音翻译模型，该模型可以在没有任何文本注释或内容信息的情况下进行训练。我们建议使用瓶颈功能作为模型的中间训练目标，以确保系统的翻译性能。对普通话 - - 坦tonese语音翻译的实验证明了所提出的方法的可行性，并且性能可以与翻译和合成质量相匹配。

Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation. State-of-art models usually contains an auxiliary module for phoneme sequences prediction, and this requires textual annotation of the training dataset. We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Instead of introducing an auxiliary phoneme prediction task in the model, we propose to use bottleneck features as intermediate training objectives for our model to ensure the translation performance of the system. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach and the performance can match a cascaded system with respect of translation and synthesis qualities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题