论文标题
ZEROFL:有效的在当地稀疏中联合学习的智障培训
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity
论文作者
论文摘要
当可用的硬件无法满足内存和计算要求以有效地训练高性能的机器学习模型时,需要折衷的训练质量或模型复杂性。在联合学习(FL)中,节点是比传统服务器级硬件更具约束的数量级,并且通常是电池供电的,严重限制了可以在此范式下训练的模型的复杂性。尽管大多数研究都集中在设计更好的聚合策略以提高收敛速度并减轻FL的沟通成本,但努力更少的努力致力于加速智障培训。这样的阶段重复数百次(即每回合),并且可以涉及数千个设备,这是培训联合模型所需的大部分时间,以及客户端的全部能源消耗所需的时间。在这项工作中,我们介绍了第一个研究在FL工作负载中培训时间引入稀疏性时出现的独特方面的研究。然后,我们提出了Zerofl,该框架依赖于高度稀疏的操作来加快设备训练。与通过将最先进的稀疏训练框架适应FL设置相比,接受Zerofl和95%稀疏性训练的模型高达2.3%的精度。
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.