Lipschitz平滑度下的轻松控制随机重新安装梯度算法的收敛性

论文标题

Lipschitz平滑度下的轻松控制随机重新安装梯度算法的收敛性

Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness

论文作者

Seccia, Ruggiero, Coppola, Corrado, Liuzzi, Giampaolo, Palagi, Laura

论文摘要

在这项工作中，我们考虑最大程度地减少了大量平滑和可能的非凸功能的平均值，并且我们专注于两个广泛使用的Minibatch框架来解决此优化问题：增量梯度（IG）和随机重新装修（RR）。我们定义了Ig/RR方案的轻松控制的修改，该方案需要额外的计算工作{但是}可以证明是在{弱}和标准假设下收敛的。特别是，我们定义了两种算法方案，其中通过使用看门狗规则和一个仅散发地激活以保证收敛性来控制IG/RR迭代。这两个方案在看门狗和线路搜索方面有所不同，这些方案是使用单调或非单调规则执行的。这两个方案还允许控制主IG/RR迭代中使用的步骤尺寸的更新，避免使用前设置的规则，该规则可能会将步骤驱动到零太快，从而减少了设计有效的步骤更新规则的努力。我们证明了组件功能梯度的Lipschitz连续性的温和假设，并使用不同的深神经体系结构和不同大小的数据集的基准进行了广泛的计算分析。我们将实施方法与完整的批次梯度方法（即L-BFG）和Ig/RR方法的实现进行了比较，证明我们的算法与其他在线算法相比需要相似的计算工作，并且对学习率的控制可能可以更快地降低目标函数。

In this work, we consider minimizing the average of a very large number of smooth and possibly non-convex functions, and we focus on two widely used minibatch frameworks to tackle this optimization problem: Incremental Gradient (IG) and Random Reshuffling (RR). We define ease-controlled modifications of the IG/RR schemes, which require a light additional computational effort {but} can be proved to converge under {weak} and standard assumptions. In particular, we define two algorithmic schemes in which the IG/RR iteration is controlled by using a watchdog rule and a derivative-free linesearch that activates only sporadically to guarantee convergence. The two schemes differ in the watchdog and the linesearch, which are performed using either a monotonic or a non-monotonic rule. The two schemes also allow controlling the updating of the stepsize used in the main IG/RR iteration, avoiding the use of pre-set rules that may drive the stepsize to zero too fast, reducing the effort in designing effective updating rules of the stepsize. We prove convergence under the mild assumption of Lipschitz continuity of the gradients of the component functions and perform extensive computational analysis using different deep neural architectures and a benchmark of varying-size datasets. We compare our implementation with both a full batch gradient method (i.e. L-BFGS) and an implementation of IG/RR methods, proving that our algorithms require a similar computational effort compared to the other online algorithms and that the control on the learning rate may allow a faster decrease of the objective function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题