Google众包语音语料库和相关的低资源语言和方言的相关开源资源：概述

论文标题

Google众包语音语料库和相关的低资源语言和方言的相关开源资源：概述

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

论文作者

Butryna, Alena, Chu, Shan-Hui Cathy, Demirsahin, Isin, Gutkin, Alexander, Ha, Linne, He, Fei, Jansche, Martin, Johny, Cibu, Katanova, Anna, Kjartansson, Oddur, Li, Chenfang, Merkulova, Tatiana, Oo, Yin May, Pipatsrisawat, Knot, Rivera, Clara, Sarin, Supheakmungkol, de Silva, Pasindu, Sodimana, Keshan, Sproat, Richard, Wattanavekin, Theeraphol, Wibawa, Jaka Aris Eko

论文摘要

本文概述了一项计划，旨在满足为不足语言开发可自由使用的语音资源的日益增长的需求。目前，我们发布了38个数据集，用于为南亚，非洲，非洲，欧洲和南美的语言和方言构建文本到语音和自动语音识别应用程序。本文描述了用于开发此类语料库的方法，并提出了我们一些可能受益于代表性不足的语言社区的发现。

This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language communities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题