G-MAP：域任务的一般内存启动的预训练语言模型

论文标题

G-MAP：域任务的一般内存启动的预训练语言模型

G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

论文作者

Wan, Zhongwei, Yin, Yichun, Zhang, Wei, Shi, Jiaxin, Shang, Lifeng, Chen, Guangyong, Jiang, Xin, Liu, Qun

论文摘要

最近，已经提出了特定领域的PLM，以通过继续使用特定领域的Corpora预先培训一般PLM来提高特定领域（例如生物医学和计算机科学）的任务性能。然而，这种域自适应的预训练（DAPT； Gururangan等人（2020））倾向于忘记一般PLM所获得的先前的常识，这导致了灾难性的遗忘现象和次要表现。为了减轻这个问题，我们提出了一个新的通用内存增强预训练的语言模型（G-MAP）的新框架，该框架通过由冷冻的一般PLM构建的内存表示，而不会失去任何常识，从而增强了特定于领域的PLM。具体而言，我们提出了一个新的内存仪表，并基于它，探索了不同的增强策略以构建内存表示，然后将其自适应地融合到特定于域的PLM中。我们证明了G-MAP对任务的各种领域（生物医学和计算机科学出版物，新闻和评论）以及不同种类（文本分类，QA，NER）的有效性，并且广泛的结果表明，所提出的G-MAP可以在所有任务上实现SOTA结果。

Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题