论文标题
学习率退火可以证明有助于概括,即使对于凸问题
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
论文作者
论文摘要
学习率的时间表可以显着影响现代神经网络中的概括性能,但尚未理解的原因。 Li-Wei-MA(2019)最近证明了这种行为可以在简化的非凸神经网络中存在。在本说明中,我们表明,即使对于凸学习问题,这种现象也可以存在,尤其是在两个维度中的线性回归。 我们给出一个玩具凸问题,其中学习率退火(大初始学习率,然后是较小的学习率)可以将梯度下降带到最小值,而概括性比整个学习率较小。在我们的情况下,这是由于测试与火车损失景观之间的不匹配以及早期停滞不前的结合而发生的。
Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood. Li-Wei-Ma (2019) recently proved this behavior can exist in a simplified non-convex neural-network setting. In this note, we show that this phenomenon can exist even for convex learning problems -- in particular, linear regression in 2 dimensions. We give a toy convex problem where learning rate annealing (large initial learning rate, followed by small learning rate) can lead gradient descent to minima with provably better generalization than using a small learning rate throughout. In our case, this occurs due to a combination of the mismatch between the test and train loss landscapes, and early-stopping.