融合固定型号，以更好地预处理

论文标题

融合固定型号，以更好地预处理

Fusing finetuned models for better pretraining

论文作者

Choshen, Leshem, Venezian, Elad, Slonim, Noam, Katz, Yoav

论文摘要

预验证的模型是训练的标准起点。这种方法始终优于随机初始化的使用。但是，预训练是很少有人可以进行的昂贵努力。在本文中，我们几乎不惜一切代价创建了更好的基本模型，将多个现有的微调模型融合为一个。具体而言，我们通过平均这些模型的权重融合。我们表明，融合模型结果超过了验证的模型。我们还表明，融合通常比训练更好。我们发现融合不太依赖目标任务。此外，体重衰减无效训练效应，而不是融合的效果。

Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour that few can undertake. In this paper, we create better base models at hardly any cost, by fusing multiple existing fine tuned models into one. Specifically, we fuse by averaging the weights of these models. We show that the fused model results surpass the pretrained model ones. We also show that fusing is often better than intertraining. We find that fusing is less dependent on the target task. Furthermore, weight decay nullifies intertraining effects but not those of fusing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题