分散梯度方法：拓扑重要吗？

论文标题

分散梯度方法：拓扑重要吗？

Decentralized gradient methods: does topology matter?

论文作者

Neglia, Giovanni, Xu, Chuan, Towsley, Don, Calbi, Gianmarco

论文摘要

基于共识的分布式优化方法最近被提倡作为参数服务器的替代方案，并为机器学习模型进行大规模训练而响起全范围范式。在这种情况下，每个工人都维护最佳参数向量的本地估计，并通过平均从其邻居获得的估计值进行迭代对其进行更新，并根据其本地数据集进行校正。虽然理论上的结果表明，工人的交流拓扑应该对融合所需的时期数量有很大的影响，但先前的实验显示了相反的结论。本文在这一明显的矛盾上阐明了灯光，并显示了稀疏拓扑可以导致更快的收敛，即使没有通信延迟。

Consensus-based distributed optimization methods have recently been advocated as alternatives to parameter server and ring all-reduce paradigms for large scale training of machine learning models. In this case, each worker maintains a local estimate of the optimal parameter vector and iteratively updates it by averaging the estimates obtained from its neighbors, and applying a correction on the basis of its local dataset. While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays.

下载PDF全文

下载文献需遵守相关版权规定

论文标题