通过语言模型探测授权实体设置扩展

论文标题

通过语言模型探测授权实体设置扩展

Empower Entity Set Expansion via Language Model Probing

论文作者

Zhang, Yunyi, Shen, Jiaming, Shang, Jingbo, Han, Jiawei

论文摘要

实体集扩展，旨在扩展具有属于同一语义类别的新实体的小种子实体，是一项关键任务，使许多下游NLP和IR应用程序受益，例如问题答案，查询理解和分类法构建。现有设置扩展方法通过自适应选择上下文功能和提取新实体来引导种子实体。实体集扩展的关键挑战是避免选择模棱两可的上下文特征，这些特征将改变类语义并导致后期迭代中的累积错误。在这项研究中，我们提出了一个新型的迭代集扩展框架，该框架利用自动生成的类名称来解决语义漂移问题。在每次迭代中，我们通过探测一个预训练的语言模型来选择一个正面和几个负类名称，并根据所选的类名称进一步评分每个候选实体。两个数据集上的实验表明，我们的框架生成高质量的类名称，并且优于先前的最新方法。

Entity set expansion, aiming at expanding a small seed entity set with new entities belonging to the same semantic class, is a critical task that benefits many downstream NLP and IR applications, such as question answering, query understanding, and taxonomy construction. Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. In this study, we propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue. In each iteration, we select one positive and several negative class names by probing a pre-trained language model, and further score each candidate entity based on selected class names. Experiments on two datasets show that our framework generates high-quality class names and outperforms previous state-of-the-art methods significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题