混合模式语料库的统一多模式标点符恢复框架

论文标题

混合模式语料库的统一多模式标点符恢复框架

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

论文作者

Zhu, Yaoming, Wu, Liwei, Cheng, Shanbo, Wang, Mingxuan

论文摘要

标点恢复任务旨在正确打孔自动语音识别系统的输出转录。以前的标点符号模型仅使用文本或要求相应的音频，往往会受到真实场景的限制，其中未符合的句子是有和没有音频的句子的混合物。本文提出了一个名为unipunc的统一的多模式标点恢复框架，以用单个模型对混合句子打点。 UniPunc在共享潜在空间中共同表示音频和非Audio样本，基于该模型学习混合表示形式并标点两种样本。我们验证了UNIPUNC对现实世界数据集的有效性，该数据集优于各种强大的基本线（例如Bert，Muse）的总体F1分数至少为0.8，这是新的最先进的。广泛的实验表明，UniPunc的设计是一种普遍的解决方案：通过嫁接以前的型号，UniPunc使它们能够在混合语料库上打点。我们的代码可在github.com/yaoming95/unipunc上获得

The punctuation restoration task aims to correctly punctuate the output transcriptions of automatic speech recognition systems. Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio. This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. UniPunc jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples. We validate the effectiveness of the UniPunc on real-world datasets, which outperforms various strong baselines (e.g. BERT, MuSe) by at least 0.8 overall F1 scores, making a new state-of-the-art. Extensive experiments show that UniPunc's design is a pervasive solution: by grafting onto previous models, UniPunc enables them to punctuate on the mixed corpus. Our code is available at github.com/Yaoming95/UniPunc

下载PDF全文

下载文献需遵守相关版权规定

论文标题