论文标题
E.T。:实体转换器。通过实体转换器块的核心增强神经语言模型,以提及富裕的表示形式
E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks
论文作者
论文摘要
在过去的十年中,神经语言建模领域见证了巨大的变化,并通过使用变压器体系结构来开发新型模型。但是,即使这些模型也很难模拟由于内存限制和增加计算复杂性而导致的长序列。培训数据上的核心注释可以提供远远超出此类语言模型的建模限制的上下文。在本文中,我们介绍了在神经语言模型(特别是在GPT2中)使用的变压器块体系结构的扩展,以便在训练过程中纳入实体注释。我们的模型GPT2E将GPT2的架构扩展到实体转换器,该体系结构旨在在存在时处理核心信息。为此,我们以微不足道的培训成本实现了实体提及的更丰富的代表。我们以Conll 2012和Lambada数据集的困惑以及实体表示形式的关键差异及其在下游任务(例如命名实体识别)中的效果,显示了GPT2和GPT2E之间的比较模型性能。此外,大多数基于变压器的语言模型可以采用我们的方法。
In the last decade, the field of Neural Language Modelling has witnessed enormous changes, with the development of novel models through the use of Transformer architectures. However, even these models struggle to model long sequences due to memory constraints and increasing computational complexity. Coreference annotations over the training data can provide context far beyond the modelling limitations of such language models. In this paper we present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2, in order to incorporate entity annotations during training. Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present. To that end, we achieve richer representations for entity mentions, with insignificant training cost. We show the comparative model performance between GPT2 and GPT2E in terms of Perplexity on the CoNLL 2012 and LAMBADA datasets as well as the key differences in the entity representations and their effects in downstream tasks such as Named Entity Recognition. Furthermore, our approach can be adopted by the majority of Transformer-based language models.