冷融合：分布式多任务列式的协作下降

论文标题

冷融合：分布式多任务列式的协作下降

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

论文作者

Don-Yehiya, Shachar, Venezian, Elad, Raffel, Colin, Slonim, Noam, Katz, Yoav, Choshen, Leshem

论文摘要

我们提出了一个新的范式，以不断发展预处理的模型，表示冷融合。它提供了多任务学习的好处，但是利用有限的通信分布式计算，并消除了对共享数据的需求。因此，冷融合会引起协同循环，在该循环中，可以回收填充模型以不断改善其基于预审预告片的模型。我们表明，冷融合通过产生（a）在接受培训的所有数据集上实现强劲性能的模型，从而产生与多任务培训相当的好处；（b）是在看不见的数据集上进行填充的更好的起点。我们表明，Cold Fusion的表现都优于Roberta甚至以前的多任务模型。具体来说，当对35个不同数据集进行训练和测试时，基于冷融合的模型平均比罗伯塔的模型平均优于2.33点，而无需对体系结构进行任何更改。

We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion. It provides the benefits of multitask learning but leverages distributed computation with limited communication and eliminates the need for shared data. Consequentially, ColD Fusion can give rise to a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based upon. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was trained on; and (b) is a better starting point for finetuning on unseen datasets. We show that ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.33 points on average without any changes to the architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题