论文标题
Conll-2003名为“实体标签者”是否在2023年仍然运行良好?
Do CoNLL-2003 Named Entity Taggers Still Work Well in 2023?
论文作者
论文摘要
Conll-2003英语名称实体识别(NER)数据集已被广泛用于培训和评估NER模型已有近20年了。但是,目前尚不清楚如何使用相同的测试集在现代数据上使用相同的测试集对这个20年前的数据进行培训并在数十年中开发的模型如何。在本文中,我们评估了在CONLL-2003上训练的20多种不同模型的概括,并表明NER模型具有非常不同的概括。令人惊讶的是,即使使用数十年的数据进行了微调,我们也没有发现预训练的变压器(例如Roberta和T5)中性能降解的证据。我们研究了为什么有些模型可以很好地推广到新数据,而其他模型则却没有,并试图解开由于测试重用而导致的时间漂移和过度拟合的影响。我们的分析表明,大多数恶化是由于训练前语料库和下游测试集之间的时间不匹配。我们发现,除了微调数据的数量外,我们发现四个因素对于良好的概括:模型架构,参数数量,预训练语料库的时间段。在某种意义上,我们建议当前的评估方法在过去20年中低估了NER的进度,因为NER模型不仅在原始Conll-2003测试集上有所改善,而且在现代数据方面有所改善。我们的数据集可在https://github.com/shuhengl/acl2023_conllpp上找到。
The CoNLL-2003 English named entity recognition (NER) dataset has been widely used to train and evaluate NER models for almost 20 years. However, it is unclear how well models that are trained on this 20-year-old data and developed over a period of decades using the same test set will perform when applied on modern data. In this paper, we evaluate the generalization of over 20 different models trained on CoNLL-2003, and show that NER models have very different generalization. Surprisingly, we find no evidence of performance degradation in pre-trained Transformers, such as RoBERTa and T5, even when fine-tuned using decades-old data. We investigate why some models generalize well to new data while others do not, and attempt to disentangle the effects of temporal drift and overfitting due to test reuse. Our analysis suggests that most deterioration is due to temporal mismatch between the pre-training corpora and the downstream test sets. We found that four factors are important for good generalization: model architecture, number of parameters, time period of the pre-training corpus, in addition to the amount of fine-tuning data. We suggest current evaluation methods have, in some sense, underestimated progress on NER over the past 20 years, as NER models have not only improved on the original CoNLL-2003 test set, but improved even more on modern data. Our datasets can be found at https://github.com/ShuhengL/acl2023_conllpp.