论文标题
以实体为中心的查询语言模型的空间有效表示
Space-Efficient Representation of Entity-centric Query Language Models
论文作者
论文摘要
虚拟助手利用自动语音识别(ASR)来帮助用户回答以实体为中心的查询。但是,由于大量经常改变的命名实体,口语实体识别是一个困难的问题。此外,当在设备上执行ASR时,可用于识别的资源将受到限制。 在这项工作中,我们研究了概率语法作为有限状态传感器(FST)框架中的语言模型的使用。我们向概率语法引入了确定性近似,该语法避免了在模型创建时间上显式扩展非末端,直接与FST框架集成,并与N-Gram模型互补。 与在没有我们的方法的情况下使用类似大小的N-Gram模型相比,我们在长尾部实体查询上获得了10%的相对单词错误率提高。
Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries. However, spoken entity recognition is a difficult problem, due to the large number of frequently-changing named entities. In addition, resources available for recognition are constrained when ASR is performed on-device. In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework. We introduce a deterministic approximation to probabilistic grammars that avoids the explicit expansion of non-terminals at model creation time, integrates directly with the FST framework, and is complementary to n-gram models. We obtain a 10% relative word error rate improvement on long tail entity queries compared to when a similarly-sized n-gram model is used without our method.