语言不可知论的多语言流媒体上的ASR系统

论文标题

语言不可知论的多语言流媒体上的ASR系统

A Language Agnostic Multilingual Streaming On-Device ASR System

论文作者

Li, Bo, Sainath, Tara N., Pang, Ruoming, Chang, Shuo-yiin, Xu, Qiumin, Strohman, Trevor, Chen, Vince, Liang, Qiao, Liu, Heguang, He, Yanzhang, Haghani, Parisa, Bidichandani, Sameer

论文摘要

设备的端到端（E2E）模型已显示出对质量和延迟的英语语音搜索任务的常规模型的改进。 E2E模型还显示了多语言自动语音识别（ASR）的有希望的结果。在本文中，我们将以前的容量解决方案扩展到流应用程序，并提出流式传输的多语言E2E ASR系统，该系统在设备上完全运行，质量和延迟与单个单语言模型相当。为了实现这一目标，我们提出了一个编码器端量模型和一个终端（EOU）联合层，以提高质量和延迟权衡。我们的系统以语言不可知的方式构建，允许它实时支持本条件的代码切换。为了解决大型模型的可行性问题，我们进行了设备分析，并用最近开发的嵌入解码器代替了耗时的LSTM解码器。通过这些更改，我们设法在不到实时的时间内在移动设备上运行了这样的系统。

On-device end-to-end (E2E) models have shown improvements over a conventional model on English Voice Search tasks in both quality and latency. E2E models have also shown promising results for multilingual automatic speech recognition (ASR). In this paper, we extend our previous capacity solution to streaming applications and present a streaming multilingual E2E ASR system that runs fully on device with comparable quality and latency to individual monolingual models. To achieve that, we propose an Encoder Endpointer model and an End-of-Utterance (EOU) Joint Layer for a better quality and latency trade-off. Our system is built in a language agnostic manner allowing it to natively support intersentential code switching in real time. To address the feasibility concerns on large models, we conducted on-device profiling and replaced the time consuming LSTM decoder with the recently developed Embedding decoder. With these changes, we managed to run such a system on a mobile device in less than real time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题