论文标题
改善了浅层神经网络随机梯度下降的全球收敛的过度散色界限
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks
论文作者
论文摘要
我们研究了一类隐藏层进料前馈神经网络的随机梯度下降算法的全局收敛所需的过度参数范围,考虑到包括Relu在内的大多数激活功能。我们根据所需的隐藏层宽度改善了现有的最新结果。我们引入了一种新的证明技术,将非线性分析与网络随机初始化的属性结合在一起。首先,我们确定了差分包含的连续解的全局融合是MSE损失的梯度流的非平滑类似物。其次,我们提供了一个技术结果(也适用于一般近似器),将上述差异包含的溶液与(离散的)随机梯度下降序列相关联,因此,为随机梯度下降迭代而建立了线性收敛朝向零损耗。
We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for the stochastic gradient descent iterations.