Bea-Base：自发匈牙利自发的ASR的基准

论文标题

Bea-Base：自发匈牙利自发的ASR的基准

BEA-Base: A Benchmark for ASR of Spontaneous Hungarian

论文作者

Mihajlik, P., Balog, A., Gráczi, T. E., Kohári, A., Tarján, B., Mády, K.

论文摘要

匈牙利人仍有1500万人说，仍然很容易访问的自动语音识别（ASR）基准数据集（尤其是对于自发的演讲），实际上并不可用。在本文中，我们介绍了Bea-Base，这是Bea Spoining Hungarian数据库的一个子集，其中主要包括140位演讲者的自发演讲。它是专门用于评估ASR的，主要用于对话AI应用程序。在定义了语音识别子集和任务后，使用开源工具包开发了几种基线，包括经过跨语言转移学习增强的经典HMM-DNN混合动力和端到端方法。获得的最佳结果是基于多语言的自我监督预处理，与经典方法相比，降低了45％的识别错误率 - 而无需应用外部语言模型或其他监督数据。结果表明，使用Beabas进行培训和评估匈牙利语音识别系统的可行性。

Hungarian is spoken by 15 million people, still, easily accessible Automatic Speech Recognition (ASR) benchmark datasets - especially for spontaneous speech - have been practically unavailable. In this paper, we introduce BEA-Base, a subset of the BEA spoken Hungarian database comprising mostly spontaneous speech of 140 speakers. It is built specifically to assess ASR, primarily for conversational AI applications. After defining the speech recognition subsets and task, several baselines - including classic HMM-DNN hybrid and end-to-end approaches augmented by cross-language transfer learning - are developed using open-source toolkits. The best results obtained are based on multilingual self-supervised pretraining, achieving a 45% recognition error rate reduction as compared to the classical approach - without the application of an external language model or additional supervised data. The results show the feasibility of using BEA-Base for training and evaluation of Hungarian speech recognition systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题