论文标题
自动转录对话的上下文依赖性rnnlm
Context Dependent RNNLM for Automatic Transcription of Conversations
论文作者
论文摘要
对话性演讲虽然在话语层面上被非结构地进行,但通常具有一个宏观主题,该主题提供了更大的上下文,涵盖了多种话语。使用经常性神经网络(RNNLM)的语音识别系统中的当前语言模型主要依赖于本地上下文并排除较大的上下文。为了模拟多个句子中单词的长期依赖性,我们提出了一个新颖的体系结构,其中先前话语中的单词转换为嵌入。这些嵌入对于当前句子中下一个单词预测的相关性是使用门控网络发现的。在语言模型中合并了相关加权上下文嵌入向量,以改善下一个单词的预测,包括上下文嵌入和相关加权层在内的整个模型是共同学习的,以进行对话性语言建模任务。实验是在两个对话数据集上进行的-AMI语料库和总巴巴语料库。在这些任务中,我们说明所提出的方法对RNNLM基线的语言模型困惑产生了显着改善。此外,在基于RNNLM的ASR基线上,使用拟议的对话LM用于ASR重新夺回ASR会导致ASR的绝对减少$ 1.2 $ \%。
Conversational speech, while being unstructured at an utterance level, typically has a macro topic which provides larger context spanning multiple utterances. The current language models in speech recognition systems using recurrent neural networks (RNNLM) rely mainly on the local context and exclude the larger context. In order to model the long term dependencies of words across multiple sentences, we propose a novel architecture where the words from prior utterances are converted to an embedding. The relevance of these embeddings for the prediction of next word in the current sentence is found using a gating network. The relevance weighted context embedding vector is combined in the language model to improve the next word prediction, and the entire model including the context embedding and the relevance weighting layers is jointly learned for a conversational language modeling task. Experiments are performed on two conversational datasets - AMI corpus and the Switchboard corpus. In these tasks, we illustrate that the proposed approach yields significant improvements in language model perplexity over the RNNLM baseline. In addition, the use of proposed conversational LM for ASR rescoring results in absolute WER reduction of $1.2$\% on Switchboard dataset and $1.0$\% on AMI dataset over the RNNLM based ASR baseline.