论文标题
语言敏捷的多语言建模
Language-agnostic Multilingual Modeling
论文作者
论文摘要
多语言自动语音识别(ASR)系统允许单个模型中数据富含数据和数据筛选的语言的联合培训。这可以使数据和参数共享跨语言共享,这对数据筛选语言特别有益。但是,大多数最先进的多语言模型都需要语言信息的编码,因此在扩展到较新的语言时不那么灵活或可扩展。与语言无关的多语言模型有助于解决这个问题,并且更适合多元文化社会,在这些社会中,经常使用几种语言(但经常与不同的写作系统一起渲染)。在本文中,我们提出了一种新的方法来构建一种语言不可思议的多语言ASR系统,该系统将所有语言通过多对一的音译传感器转换为一种写作系统。因此,类似的声音映射到素描的单个规范目标序列,有效地分隔了建模和渲染问题。我们用四种指示语言展示了印地语,孟加拉语,泰米尔语和卡纳达语,表明语言不合时宜的多语言模型可在单词错误率(WER)相对降低10%,而与语言相关的多语言模型相对降低。
Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scalable when expanding to newer languages. Language-independent multilingual models help to address this issue, and are also better suited for multicultural societies where several languages are frequently used together (but often rendered with different writing systems). In this paper, we propose a new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer. Thus, similar sounding acoustics are mapped to a single, canonical target sequence of graphemes, effectively separating the modeling and rendering problems. We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10% relative reduction in Word Error Rate (WER) over a language-dependent multilingual model.