论文标题

指定实体识别中未标记的实体问题的噪声损失

A Noise-Robust Loss for Unlabeled Entity Problem in Named Entity Recognition

论文作者

Kang, Wentao, Zhang, Guijun, Fu, Xiao

论文摘要

命名实体识别(NER)是自然语言处理中的重要任务。但是,传统的监督NER需要大规模注释的数据集。提出了遥远的监督以减轻对数据集的巨大需求,但是以这种方式构建的数据集非常嘈杂,并且存在严重的未标记实体问题。交叉熵(CE)损耗函数对未标记的数据高度敏感,从而导致严重的性能降解。作为替代方案,我们提出了一种称为NRCE的新损失函数,以应对此问题。 Sigmoid项用于减轻噪声的负面影响。此外,我们根据样品和训练过程平衡模型的收敛性和噪声耐受性。关于合成和现实世界数据集的实验表明,在严重的未标记实体问题的情况下,我们的方法表现出强大的鲁棒性,从而实现了现实世界中的新最新技术。

Named Entity Recognition (NER) is an important task in natural language processing. However, traditional supervised NER requires large-scale annotated datasets. Distantly supervision is proposed to alleviate the massive demand for datasets, but datasets constructed in this way are extremely noisy and have a serious unlabeled entity problem. The cross entropy (CE) loss function is highly sensitive to unlabeled data, leading to severe performance degradation. As an alternative, we propose a new loss function called NRCES to cope with this problem. A sigmoid term is used to mitigate the negative impact of noise. In addition, we balance the convergence and noise tolerance of the model according to samples and the training process. Experiments on synthetic and real-world datasets demonstrate that our approach shows strong robustness in the case of severe unlabeled entity problem, achieving new state-of-the-art on real-world datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源