论文标题
了解神经网络的弹射动力学的二次模型
Quadratic models for understanding catapult dynamics of neural networks
论文作者
论文摘要
尽管神经网络可以随着线性模型的增加而近似,但线性模型不能捕获广泛神经网络的某些特性。在这项工作中,我们表明,最近提出的神经二次模型可以表现出“弹射阶段” [Lewkowycz等。 2020年]当训练以较高学习率的训练模型时会产生。然后,我们从经验上表明,神经二次模型的行为与泛化的神经网络的行为相似,尤其是在弹射阶段制度中。我们的分析进一步表明,二次模型可以成为分析神经网络的有效工具。
While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.