论文标题
SGEM:具有能量和动量的随机梯度
SGEM: stochastic gradient with energy and momentum
论文作者
论文摘要
在本文中,我们提出了具有能量和动量的随机梯度的SGEM,以基于起源于工作[AEGD:能量自适应梯度下降的AEGD方法,解决了一大批一般的非凸随机优化问题。 ARXIV:2010.05109]。 SGEM同时结合了能量和动量,以继承其双重优势。我们表明,SGEM具有无条件的能量稳定性,并在一般的非凸随机环境中得出了能量依赖性的收敛速率,并且在在线凸室设置中遇到了遗憾。还提供了能量变量的较低阈值。我们的实验结果表明,SGEM的收敛速度比AEGD更快,并且至少在训练某些深层神经网络方面更能或至少概括了SGDM。
In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.