论文标题
SGD的黑匣子Lie Group Axpenters
Black Box Lie Group Preconditioners for SGD
论文作者
论文摘要
提出了无基质和低等级近似预处理,以通过利用从Hessian-Vector产物中采样的曲率信息或类似于BFGS算法的参数和梯度的有限差异来加速随机梯度下降(SGD)的收敛。两位预处理都均采用在线更新方式最小化的标准,该标准无线搜索,并且对随机梯度噪声进行了鲁棒,并进一步限制在某些连接的谎言组上,以保持其相应的对称性或不变性,例如,与正面的一般线性群体的坐标群体的方向相同,以正面的线性为正面的线性。 Lie Group的竞争属性有助于预处理拟合,其不变性属性可节省任何阻尼的需求,这在二阶优化器中很常见,但很难调整。参数更新的学习率和预处理拟合的步长自然归一化,并且它们的默认值在大多数情况下都可以很好地工作。
A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of stochastic gradient descent (SGD) by exploiting curvature information sampled from Hessian-vector products or finite differences of parameters and gradients similar to the BFGS algorithm. Both preconditioners are fitted with an online updating manner minimizing a criterion that is free of line search and robust to stochastic gradient noise, and further constrained to be on certain connected Lie groups to preserve their corresponding symmetry or invariance, e.g., orientation of coordinates by the connected general linear group with positive determinants. The Lie group's equivariance property facilitates preconditioner fitting, and its invariance property saves any need of damping, which is common in second-order optimizers, but difficult to tune. The learning rate for parameter updating and step size for preconditioner fitting are naturally normalized, and their default values work well in most situations.