伯特·韦弗（Bert Weaver）：使用平均重量为生物医学语义搜索引擎中的基于变压器的模型提供终身学习

论文标题

伯特·韦弗（Bert Weaver）：使用平均重量为生物医学语义搜索引擎中的基于变压器的模型提供终身学习

BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines

论文作者

Kühnel, Lisa, Schulz, Alexander, Hammer, Barbara, Fluck, Juliane

论文摘要

转移学习的最新发展促进了自然语言处理任务的进步。但是，性能取决于高质量的手动注释培训数据。特别是在生物医学领域中，已经表明，一个训练语料库不足以学习能够有效预测新数据的通用模型。因此，为了在现实世界应用中使用，最先进的模型需要终身学习的能力，以提高性能，只要有新数据可用 - 而无需从头开始重新训练整个模型。我们提出了Weaver，这是一种简单而有效的后处理方法，它将旧知识注入新模型，从而减少了灾难性的遗忘。我们表明，以依次的方式应用织布工会导致类似的单词嵌入分布与一次对所有数据进行联合培训，同时在计算上更有效。由于不需要数据共享，因此提出的方法也很容易适用于联邦学习设置，例如，可能对从不同诊所的电子健康记录挖掘有益。

Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once, while being computationally more efficient. Because there is no need of data sharing, the presented method is also easily applicable to federated learning settings and can for example be beneficial for the mining of electronic health records from different clinics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题