论文标题

印度英语发音变化相对于收到的发音的研究

Study of Indian English Pronunciation Variabilities relative to Received Pronunciation

论文作者

Pal, Priyanshi, Jain, Shelly, Vuppala, Anil, Yarra, Chiranjeevi, Ghosh, Prasanta

论文摘要

印度英语(IE)发音差异的分析对于在印度背景下构建自动语音识别(ASR)和文本到语音(TTS)综合的系统很有用。通常,通过将IE发音与收到的发音(RP)进行比较,已经探索了这些发音变化。但是,要探索这些变异性,需要在语音级别上标记发音数据,这对于IE来说很少。此外,IE的多功能性源于说话者的母语和人口统计区域差异的大量多样性的影响。先前的语言作品通过报告代表相对于RP的这种变化的语音规则来定性地表征了IE变化的特征。定性描述通常缺乏定量描述符和对不同发音数据的数据驱动分析,以在语音级别表征IE。为了解决这些问题,在这项工作中,我们考虑了一个语料库,指示了圆头,其中包含来自印度各个地区的80位发言人的大量IE品种。我们提出了一项分析,以获取以数据驱动方式相对于RP代表IE发音变化的新的语音规则集。我们使用15,974个语音转录来做到这一点,其中除了语料库的部分外,还获得了13,632个。此外,我们验证了从分析中对现有语音规则获得的规则,以确定所获得的语音规则的相关性,并测试基于根据语音误差率(PER)作为绩效指标的标准的规则,基于获得的规则(G2P)转换的效果。

Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciation data at the phonetic level, which is scarce for IE. Moreover, versatility of IE stems from the influence of a large diversity of the speakers' mother tongues and demographic region differences. Prior linguistic works have characterised features of IE variabilities qualitatively by reporting phonetic rules that represent such variations relative to RP. The qualitative descriptions often lack quantitative descriptors and data-driven analysis of diverse IE pronunciation data to characterise IE on the phonetic level. To address these issues, in this work, we consider a corpus, Indic TIMIT, containing a large set of IE varieties from 80 speakers from various regions of India. We present an analysis to obtain the new set of phonetic rules representing IE pronunciation variabilities relative to RP in a data-driven manner. We do this using 15,974 phonetic transcriptions, of which 13,632 were obtained manually in addition to those part of the corpus. Furthermore, we validate the rules obtained from the analysis against the existing phonetic rules to identify the relevance of the obtained phonetic rules and test the efficacy of Grapheme-to-Phoneme (G2P) conversion developed based on the obtained rules considering Phoneme Error Rate (PER) as the metric for performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源