在深层网络中早期停止：双重下降以及如何消除它

论文标题

在深层网络中早期停止：双重下降以及如何消除它

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

论文作者

Heckel, Reinhard, Yilmaz, Fatih Furkan

论文摘要

高参数的模型（例如大型深网）通常表现出双重下降现象，而模型大小，误差的函数首先减小，增加和减少。这种有趣的双重下降行为也是训练时期的函数，并且由于训练时期控制了模型的复杂性，因此出现了猜想。在本文中，我们表明，这种时期的双重下降出现是出于不同的原因：这是由于两个或多个偏置差异折衷的叠加引起的，因为在不同的时期学习了网络的不同部分，并且通过适当的步骤来消除这一点，可以显着提高早期停止绩效。我们对i）线性回归进行了分析表明，其中规模不同的特征引起了偏见 - 方差折衷的叠加，ii）ii）两层神经网络，在该网络中，第一和第二层每个人都控制着偏见变化的权衡权衡。受到这一理论的启发，我们从经验上研究了两个标准的卷积网络，并表明，通过调整不同层的步骤消除时期双重下降可以显着改善早期停止性能。

Over-parameterized models, such as large deep networks, often exhibit a double descent phenomenon, whereas a function of model size, error first decreases, increases, and decreases at last. This intriguing double descent behavior also occurs as a function of training epochs and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题