论文标题

命名实体识别的语法驱动数据增强

Syntax-driven Data Augmentation for Named Entity Recognition

论文作者

Sutiono, Arie Pratama, Hahn-Powell, Gus

论文摘要

在低资源设置中,通常利用数据增强策略来提高性能。许多方法尝试了文档级的扩展(例如文本分类),但是很少有研究探讨了令牌级的增强。表演天真,数据增强可以产生语义上不一致和不语法的示例。在这项工作中,我们比较了使用选区突变的简单掩盖语言模型更换和增强方法,以提高低资源设置中指定实体识别的性能,以保留增强句子的语言凝聚力。

In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源