论文标题
拜占庭的机器学习通过弹性平均而变得容易
Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
论文作者
论文摘要
拜占庭的弹性成为分布式机器学习社区中的一个重要话题。从本质上讲,目标是增强分布式优化算法,例如分布式SGD,以确保尽管存在某些不当行为(又称{\ em byzantine})工人,但仍可以保证收敛。尽管已经提出了无数解决问题的技术,但该领域可以说是脆弱的基础。这些技术很难证明是正确的,并且依靠(a)(a)非常不现实的假设,即经常在实践中违反,并且(b)异质性,即使其难以比较方法。 我们提出\ emph {resam(动量的弹性平均)},这是一个统一的框架,使建立最佳的拜占庭式弹性变得容易,仅依靠标准的机器学习假设。我们的框架主要由两个运算符组成:\ emph {弹性平均}在服务器和工人的\ emph {分布式动量}。我们证明了一般定理,表明RESAM下的分布式SGD的收敛性。有趣的是,展示和比较许多现有技术的收敛性成为我们定理的直接推论,而无需诉诸严格的假设。我们还对RESAM的实际相关性进行了经验评估。
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: \emph{resilient averaging} at the server and \emph{distributed momentum} at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.