论文标题
浅线性神经网络上辍学的渐近收敛速率
Asymptotic convergence rate of Dropout on shallow linear neural networks
论文作者
论文摘要
当将它们应用于浅线性神经网络(NNS)时,我们分析了通过辍学和dropconnect引起的目标函数的梯度流的收敛速率 - 也可以将其视为使用特定的正常器进行矩阵分解。因此,诸如此类的辍学算法是正则化技术,它们在训练过程中使用0,1值的随机变量来滤波重量,以避免共同适应特征。通过利用最新结果对非凸优化的优化,并对损失函数的一组最小化器以及Hessian进行了仔细的分析,我们可以获得(i)梯度流的局部收敛证明,以及(ii)取决于数据的收敛速率,取决于数据,辍学的可能性和NN的宽度。最后,我们将这种理论绑定与数值模拟进行了比较,这些模拟与收敛结合的定性一致并在足够接近最小化器时匹配并匹配它。
We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) - which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.