论文标题
在宽两层神经网络的动力学中的对称性上
On the symmetries in the dynamics of wide two-layer neural networks
论文作者
论文摘要
我们考虑了梯度流对人口的理想化设置对无限宽的两层relu神经网络(无偏见)的风险,并研究对称性对学习参数和预测因子的影响。我们首先描述了一个通用类的对称性类别,当目标函数$ f^*$和输入分布满足时,动态保留了。然后,我们研究更多的特定情况。当$ f^*$很奇怪时,我们表明预测变量的动力学将减少到(非线性参数化)线性预测变量,并且可以保证其指数收敛。当$ f^*$具有低维结构时,我们证明梯度流PDE会降低到较低维的PDE。此外,我们提出了非正式和数值论点,表明输入神经元与问题的较低维结构保持一致。
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.