论文标题
通过更简单的合成任务洞悉预训练
Insights into Pre-training via Simpler Synthetic Tasks
论文作者
论文摘要
预训练会产生对各种下游任务有效的表示,但目前尚不清楚预培训的有效收益所需的训练特性是什么。值得注意的是,最近的工作表明,即使对合成任务进行预训练也可以在下游任务中取得显着增长。在这项工作中,我们进行了三个实验,可以迭代地简化预训练,并表明这些简化仍然保留了其大部分收益。首先,在先前的工作中,我们对六个下游任务的三种现有合成预训练方法进行系统评估。我们发现最好的合成预训练方法是石灰,平均获得了自然预训练的收益$ 67 \%$。其次,令我们惊讶的是,我们发现由设定功能定义的简单且通用的合成任务进行预培训可实现$ 65 \%的好处,几乎是匹配的石灰。第三,我们发现仅使用合成预培训的参数统计数据可以实现$ 39 \%的利益。我们在https://github.com/felixzli/synthetic_pretraining上发布源代码。
Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of $67\%$ of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME. Third, we find that $39\%$ of the benefits can be attained by using merely the parameter statistics of synthetic pre-training. We release the source code at https://github.com/felixzli/synthetic_pretraining.