在宽两层神经网络的动力学中的对称性上

论文标题

在宽两层神经网络的动力学中的对称性上

On the symmetries in the dynamics of wide two-layer neural networks

论文作者

Hajjar, Karl, Chizat, Lenaic

论文摘要

我们考虑了梯度流对人口的理想化设置对无限宽的两层relu神经网络（无偏见）的风险，并研究对称性对学习参数和预测因子的影响。我们首先描述了一个通用类的对称性类别，当目标函数$ f^*$和输入分布满足时，动态保留了。然后，我们研究更多的特定情况。当$ f^*$很奇怪时，我们表明预测变量的动力学将减少到（非线性参数化）线性预测变量，并且可以保证其指数收敛。当$ f^*$具有低维结构时，我们证明梯度流PDE会降低到较低维的PDE。此外，我们提出了非正式和数值论点，表明输入神经元与问题的较低维结构保持一致。

We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题