基于从科学文档中提取键形键的自我介绍的联合学习方法

论文标题

基于从科学文档中提取键形键的自我介绍的联合学习方法

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

论文作者

Lai, Tuan Manh, Bui, Trung, Kim, Doo Soon, Tran, Quan Hung

论文摘要

键形提取是提取最能描述文档的一小部分短语的任务。该任务的大多数现有基准数据集通常具有有限的注释文档，因此训练日益复杂的神经网络具有挑战性。相比之下，数字图书馆在网上存储了数百万个科学文章，涵盖了广泛的主题。尽管这些文章中很大一部分包含由作者提供的钥匙拼，但大多数其他文章都缺乏这种注释。因此，为了有效利用这些大量未标记的文章，我们提出了一种基于自我依据的想法的简单有效的联合学习方法。实验结果表明，我们的方法始终提高基线模型的键形提取。此外，我们最好的模型优于以前的任务方法，在两个公共基准下实现了新的最新结果：Inspec和Semeval-2017。

Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.

下载PDF全文

下载文献需遵守相关版权规定

论文标题