在异质数据上高维度的拜占庭式SGD

论文标题

在异质数据上高维度的拜占庭式SGD

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

论文作者

Data, Deepesh, Diggavi, Suhas

论文摘要

我们研究了拜占庭式攻击下的大师级体系结构中分布的随机梯度下降（SGD）。我们考虑了异构数据模型，其中不同的工人可能具有不同的本地数据集，并且我们没有对数据生成的任何概率假设。在我们算法的核心方面，我们使用多项式离群离值较小程序来进行Steinhardt等人提出的稳健平均估计。（ITCS 2018）过滤损坏梯度。为了能够在我们的{\ em异质}数据设置中应用其过滤过程，工人计算{\ em随机}梯度，我们得出了一个新的矩阵浓度结果，这可能是独立的。我们提供平滑强键和非凸目标目标的收敛分析。我们在局部随机梯度的有界方差假设下得出结果，并在数据集上的{\ em确定性}条件（即梯度差异）上得出；对于这两种数量，我们在统计异质数据模型中提供具体界限。我们在随机梯度的迷你批量大小和近似误差之间进行权衡。我们的算法最多可以忍受$ \ frac {1} {4} $分数拜占庭工人。它可以在强烈频率设置的强度快速设置中找到近似的最佳参数，并以线性速度达到非convex设置中的近似固定点，因此，在无拜占庭无环境中匹配香草SGD的收敛速率。我们还建议和分析具有梯度压缩的拜占庭弹性SGD算法，在此工人发送其梯度的$ K $随机坐标。在温和的条件下，我们显示了$ \ frac {d} {k} $ - 在通信位中节省因子，并在无压缩算法上解码复杂性，而不会影响其收敛速率（订单的）和近似误差。

We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our {\em heterogeneous} data setting where workers compute {\em stochastic} gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives. We derive our results under the bounded variance assumption on local stochastic gradients and a {\em deterministic} condition on datasets, namely, gradient dissimilarity; and for both these quantities, we provide concrete bounds in the statistical heterogeneous data model. We give a trade-off between the mini-batch size for stochastic gradients and the approximation error. Our algorithm can tolerate up to $\frac{1}{4}$ fraction Byzantine workers. It can find approximate optimal parameters in the strongly-convex setting exponentially fast and reach to an approximate stationary point in the non-convex setting with a linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting. We also propose and analyze a Byzantine-resilient SGD algorithm with gradient compression, where workers send $k$ random coordinates of their gradients. Under mild conditions, we show a $\frac{d}{k}$-factor saving in communication bits as well as decoding complexity over our compression-free algorithm without affecting its convergence rate (order-wise) and the approximation error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题