论文标题
关于分散数据的分散学习的概括
On Generalization of Decentralized Learning with Separable Data
论文作者
论文摘要
当数据自然分布在通过基础图的代理商之间时,分散学习提供了隐私和沟通效率。由过度参数化的学习设置(在该设置中训练为零的训练损失),我们研究了分散学习的算法和概括性属性,并在可分离的数据上下降了梯度下降。具体而言,对于分散的梯度下降(DGD)和各种损失函数,在无穷大(包括指数损失和逻辑损失)中渐近为零,我们得出了新的有限时间概括界。这补充了一长串最近的工作,这些工作研究了概括性能和梯度下降的隐含偏见,而不是可分离的数据,但迄今为止,梯度下降的偏向却仅限于集中学习方案。值得注意的是,我们的概括范围大约匹配其集中式同行。这背后的关键和独立的关注是在一系列自我结合的损失方面建立了关于训练损失和DGD的传记率的新界限。最后,在算法方面,我们设计了改进的基于梯度的例程,可分离数据,并在经验上证明了训练和概括性能方面的加速命令。
Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.