论文标题
语音感知模型的知名英语基准
The Perceptimatic English Benchmark for Speech Perception Models
论文作者
论文摘要
我们介绍了感知的英语基准,这是一种开放的实验基准,用于评估英语语音感知的定量模型。基准由ABX刺激以及91位讲英语听众的反应组成。刺激测试歧视大量的英语和法语语音对比。它们直接从读取语音的语料库中提取,使其适合评估对典型语音数据集培训的统计声学模型(例如在自动语音识别中使用的模型)。我们表明,电话歧视与几种类型的模型相关,并为寻求在实验刺激上易于计算的声学距离的研究人员提供建议。我们表明,英语标准的语音识别器DeepSpeech比英语听众更专业,并且与他们的行为相关,即使对人类提供的决策任务造成了较低的错误。
We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech, making them appropriate for evaluating statistical acoustic models (such as those used in automatic speech recognition) trained on typical speech data sets. We show that phone discrimination is correlated with several types of models, and give recommendations for researchers seeking easily calculated norms of acoustic distance on experimental stimuli. We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners, and is poorly correlated with their behaviour, even though it yields a low error on the decision task given to humans.