论文标题

SKIT-S2I:印度的意图数据集的演讲

Skit-S2I: An Indian Accented Speech to Intent dataset

论文作者

Rajaa, Shangeth, Dalmia, Swaraj, Nethil, Kumarmanas

论文摘要

传统的对话助理使用自动语音识别(ASR)从语音信号中提取文本笔录,然后从转录中预测意图。使用端到端的口语理解(SLU),直接从语音信号中预测说话者的意图而无需中间文本笔录。结果,该模型可以直接优化以进行意图分类,并避免从ASR出现级联错误。端到端SLU系统还有助于减少意图预测模型的延迟。尽管许多数据集可公开用于文本之间的任务,但标记为语音到自然数据集的可用性有限,并且印度口音中没有可用的数据集。在本文中,我们发布了Skit-S2i数据集,这是银行域中首个在对话式音调中的印度公开式SLU数据集。我们尝试了多个基线,比较了不同审慎的语音编码器的表示,并发现SSL预处理的表示的表现略优于ASR预告片的表示,缺乏韵律特征,无法进行语音到无限分类。数据集和基线代码可在\ url {https://github.com/skit-ai/speech-to-intent-dataset}中获得

Conventional conversation assistants extract text transcripts from the speech signal using automatic speech recognition (ASR) and then predict intent from the transcriptions. Using end-to-end spoken language understanding (SLU), the intents of the speaker are predicted directly from the speech signal without requiring intermediate text transcripts. As a result, the model can optimize directly for intent classification and avoid cascading errors from ASR. The end-to-end SLU system also helps in reducing the latency of the intent prediction model. Although many datasets are available publicly for text-to-intent tasks, the availability of labeled speech-to-intent datasets is limited, and there are no datasets available in the Indian accent. In this paper, we release the Skit-S2I dataset, the first publicly available Indian-accented SLU dataset in the banking domain in a conversational tonality. We experiment with multiple baselines, compare different pretrained speech encoder's representations, and find that SSL pretrained representations perform slightly better than ASR pretrained representations lacking prosodic features for speech-to-intent classification. The dataset and baseline code is available at \url{https://github.com/skit-ai/speech-to-intent-dataset}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源