自适应分布的随机梯度下降，以最大程度地减少散乱者的延迟

论文标题

自适应分布的随机梯度下降，以最大程度地减少散乱者的延迟

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

论文作者

Hanna, Serge Kas, Bitar, Rawad, Parag, Parimal, Dasari, Venkat, Rouayheb, Salim El

论文摘要

我们考虑主人想要在$ n $工人上运行分布式随机梯度下降（SGD）算法的设置。分布式的SGD可能会遭受散乱者的影响，即导致延迟的缓慢或反应迟钝的工人。文献中研究的一种解决方案是在更新模型之前等待每次迭代的最快$ k <n $工人的响应，其中$ k $是固定参数。 $ k $的价值的选择提出了SGD的运行时（即收敛率）与模型错误之间的权衡。为了优化误差折衷，我们使用自适应$ k $调查了分布式SGD。我们首先设计了一种自适应策略，用于改变$ k $，该策略根据误差的上限来优化这种权衡，这是我们得出的墙上锁定时间的函数。然后，我们提出了一种基于统计启发式的自适应分布式SGD的算法。我们实施算法并提供数值模拟，以确认我们的直觉和理论分析。

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive $k$. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题