关于多个八卦步骤的好处

论文标题

关于多个八卦步骤的好处

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

论文作者

Hashemi, Abolfazl, Acharya, Anish, Das, Rudrajit, Vikalo, Haris, Sanghavi, Sujay, Dhillon, Inderjit

论文摘要

在分散的优化中，与八卦（即通过网络上的平均值）步骤使节点交流（局部）梯度下降迭代是常见的算法实践。通过大规模机器学习模型的培训，要求消息是本地参数的{\ em Lossy cormanty}版本，这也越来越普遍。在本文中，我们表明，在这样的压缩分散的优化设置中，即使这样做的成本也适当地解释了例如。通过降低压缩信息的精度。 In particular, we show that having $O(\log\frac{1}ε)$ gradient iterations {with constant step size} - and $O(\log\frac{1}ε)$ gossip steps between every pair of these iterations - enables convergence to within $ε$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition.该结果也可以实现平稳的强烈凸目标。据我们所知，这是在任意通信压缩下获得非convex优化结果的第一批作品。

In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having $O(\log\frac{1}ε)$ gradient iterations {with constant step size} - and $O(\log\frac{1}ε)$ gossip steps between every pair of these iterations - enables convergence to within $ε$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题