预验证语言模型的大型产品钥匙内存

论文标题

预验证语言模型的大型产品钥匙内存

Large Product Key Memory for Pretrained Language Models

论文作者

Kim, Gyuwan, Jung, Tae-Hwan

论文摘要

Lample等人提出的产品键内存（PKM）。（2019）使能够通过微不足道的计算开销有效地提高模型容量来提高预测准确性。但是，他们的经验应用仅限于因果语言建模。在最近审计的语言模型（PLM）的成功的推动下，我们研究了如何将大型PKM纳入PLM中，这些PLM可以针对各种下游NLP任务进行填充。我们定义了一个新的内存使用度度量，并使用此指标进行了仔细的观察表明，在训练PKM-EAGMENT模型期间，大多数存储插槽仍然过时。为了解决此问题，我们提出了简单但有效的解决方案：（1）从没有内存的模型权重的初始化中进行初始化，以及（2）通过添加而不是替换馈电网络来增强PKM。我们验证它们两个对于预处理PKM增强的PLM至关重要，增强了记忆利用率和下游性能。可以在https://github.com/clovaai/pkm-transformers上获得代码和预审计的权重。

Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pretrained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pretrained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pretraining of PKM-augmented PLMs, enhancing memory utilization and downstream performance. Code and pretrained weights are available at https://github.com/clovaai/pkm-transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题