从重复数据中缩放法律和学习的解释性

论文标题

从重复数据中缩放法律和学习的解释性

Scaling Laws and Interpretability of Learning from Repeated Data

论文作者

Hernandez, Danny, Brown, Tom, Conerly, Tom, DasSarma, Nova, Drain, Dawn, El-Showk, Sheer, Elhage, Nelson, Hatfield-Dodds, Zac, Henighan, Tom, Hume, Tristan, Johnston, Scott, Mann, Ben, Olah, Chris, Olsson, Catherine, Amodei, Dario, Joseph, Nicholas, Kaplan, Jared, McCandlish, Sam

论文摘要

最近的大型语言模型已在广泛的数据集上进行了培训，但也经常在重复的数据上进行培训，故意是为了上升质量的数据，或者是无意中的，因为数据重复数据删除并不完美，并且该模型在句子，段落，段落或文档级别上接触到重复的数据。一些作品报告了此重复数据的实质性负面绩效影响。在本文中，我们试图系统地研究重复的数据，并从机械上理解其效果。为此，我们训练一个模型家族，其中大多数数据都是独特的，但其中一小部分重复了很多次。我们发现强烈的双重下降现象，其中重复的数据可以导致测试损失，从而通过训练中途增加。可预测的重复频率范围会导致性能出人意料的严重降解。例如，尽管其他90％的训练令牌仍然是独特的，但通过重复100次数据的0.1％，可以将800m参数模型的性能降低到2倍较小的模型（400m参数）。我们怀疑中间有一个范围，可以记住数据并这样做的很大一部分，而这可能是降解峰值发生的地方。最后，我们将这些观察结果与最近的机械解释性工作联系起来 - 试图反向工程师通过模型执行的详细计算 - 表明数据重复不成比例地损害了复制和与概括相关的内部结构，例如感应头，从而为从通用化转向记忆的转移提供了一种可能的机制。综上所述，这些结果为为什么在大语言模型中重复相对较小的数据可能导致对性能的危害不成比例。

Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplication is not perfect and the model is exposed to repeated data at the sentence, paragraph, or document level. Some works have reported substantial negative performance effects of this repeated data. In this paper we attempt to study repeated data systematically and to understand its effects mechanistically. To do this, we train a family of models where most of the data is unique but a small fraction of it is repeated many times. We find a strong double descent phenomenon, in which repeated data can lead test loss to increase midway through training. A predictable range of repetition frequency leads to surprisingly severe degradation in performance. For instance, performance of an 800M parameter model can be degraded to that of a 2x smaller model (400M params) by repeating 0.1% of the data 100 times, despite the other 90% of the training tokens remaining unique. We suspect there is a range in the middle where the data can be memorized and doing so consumes a large fraction of the model's capacity, and this may be where the peak of degradation occurs. Finally, we connect these observations to recent mechanistic interpretability work - attempting to reverse engineer the detailed computations performed by the model - by showing that data repetition disproportionately damages copying and internal structures associated with generalization, such as induction heads, providing a possible mechanism for the shift from generalization to memorization. Taken together, these results provide a hypothesis for why repeating a relatively small fraction of data in large language models could lead to disproportionately large harms to performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题