我的神经网络融合了哪种最小化？

论文标题

我的神经网络融合了哪种最小化？

Which Minimizer Does My Neural Network Converge To?

论文作者

Nonnenmacher, Manuel, Reeb, David, Steinwart, Ingo

论文摘要

过度参数化神经网络（NN）的损失表面具有许多全球最小值，零训练误差。我们解释了标准NN训练程序的常见变体如何改变获得的最小化器。首先，我们明确说明了强烈参数化的NN初始化的大小如何影响最小化器，并可能恶化其最终的测试性能。我们提出了一种限制这种效果的策略。然后，我们证明，对于自适应优化（例如Adagrad），所获得的最小化器通常与梯度下降（GD）最小化器不同。随机迷你批次训练，即使在非自适应情况下，GD和随机GD基本上相同的最小化器，这种自适应最小化器也会进一步改变。最后，我们解释说，这些效果仍然与较少参数化的NN相关。虽然过度参数化具有其好处，但我们的工作强调，它会导致参数型模型缺乏错误来源。

The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy to limit this effect. Then, we demonstrate that for adaptive optimization such as AdaGrad, the obtained minimizer generally differs from the gradient descent (GD) minimizer. This adaptive minimizer is changed further by stochastic mini-batch training, even though in the non-adaptive case, GD and stochastic GD result in essentially the same minimizer. Lastly, we explain that these effects remain relevant for less overparameterized NNs. While overparameterization has its benefits, our work highlights that it induces sources of error absent from underparameterized models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题