论文标题
过度参数深度平衡模型的全球收敛性
Global Convergence of Over-parameterized Deep Equilibrium Models
论文作者
论文摘要
深层平衡模型(DEQ)是通过带有输入注入的无限深度重量绑定模型的平衡点隐式定义的。它不是无限的计算,而是直接用根找到的平衡点,并计算具有隐式分化的梯度。在这项研究中研究了过度参数化DEQ的训练动力学。通过假设在初始平衡点上的条件,我们表明在训练过程中始终存在唯一的平衡点,并且证明梯度下降被证明是在二次损耗函数的线性收敛速率上收敛到全球最佳解决方案。为了证明所需的初始条件是通过轻度过度参数满足所需的初始条件,我们对随机DEQ进行了细粒度的分析。我们提出了一个新型的概率框架,以克服无限多重重量绑定模型的非反应分析的技术难度。
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.