以实体为中心的查询语言模型的空间有效表示

论文标题

以实体为中心的查询语言模型的空间有效表示

Space-Efficient Representation of Entity-centric Query Language Models

论文作者

Van Gysel, Christophe, Hannemann, Mirko, Pusateri, Ernest, Oualil, Youssef, Oparin, Ilya

论文摘要

虚拟助手利用自动语音识别（ASR）来帮助用户回答以实体为中心的查询。但是，由于大量经常改变的命名实体，口语实体识别是一个困难的问题。此外，当在设备上执行ASR时，可用于识别的资源将受到限制。在这项工作中，我们研究了概率语法作为有限状态传感器（FST）框架中的语言模型的使用。我们向概率语法引入了确定性近似，该语法避免了在模型创建时间上显式扩展非末端，直接与FST框架集成，并与N-Gram模型互补。与在没有我们的方法的情况下使用类似大小的N-Gram模型相比，我们在长尾部实体查询上获得了10％的相对单词错误率提高。

Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries. However, spoken entity recognition is a difficult problem, due to the large number of frequently-changing named entities. In addition, resources available for recognition are constrained when ASR is performed on-device. In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework. We introduce a deterministic approximation to probabilistic grammars that avoids the explicit expansion of non-terminals at model creation time, integrates directly with the FST framework, and is complementary to n-gram models. We obtain a 10% relative word error rate improvement on long tail entity queries compared to when a similarly-sized n-gram model is used without our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题