重新配置：通过传输深层压缩残差，沟通有效的联邦学习

论文标题

重新配置：通过传输深层压缩残差，沟通有效的联邦学习

ResFed: Communication Efficient Federated Learning by Transmitting Deep Compressed Residuals

论文作者

Song, Rui, Zhou, Liguo, Lyu, Lingjuan, Festag, Andreas, Knoll, Alois

论文摘要

联合学习通过共享他们学到的本地模型参数，可以在大规模分布的客户中进行合作培训。但是，随着模型大小的增加，部署联合学习需要大型的通信带宽，这限制了其在无线网络中的部署。为了解决此瓶颈，我们引入了一个基于残留的联合学习框架（RESFED），其中残留物而不是模型参数是在通信网络中传输的培训。特别是，我们集成了两对共享预测指标，以在服务器到客户和客户对服务器通信中的模型预测。通过采用共同的预测规则，在客户和服务器中始终可以完全恢复本地和全球更新的模型。我们强调，残差仅表示单个相互作用中模型的准更古老，因此，与模型的权重和梯度相比，与模型相比，与模型相比具有更低的熵。基于此属性，我们通过泄漏和量化进一步对残差进行有损耗的压缩，并编码它们以进行有效的通信。实验评估表明，与标准的联邦学习相比，我们的重新配置的沟通成本大大降低，并通过利用敏感的残留量来实现更好的准确性。例如，要在CIFAR-10上训练4.08 MB CNN模型，在非独立且分布的（非IID）设置下有10个客户端，我们的方法在每个通信中达到了超过700倍的压缩比，对准确性的影响最小。为了达到70％的准确性，它可以为上流传输的总沟通量从587.61 MB节省至6.79 MB，并为所有客户平均节省4.61 MB。

Federated learning enables cooperative training among massively distributed clients by sharing their learned local model parameters. However, with increasing model size, deploying federated learning requires a large communication bandwidth, which limits its deployment in wireless networks. To address this bottleneck, we introduce a residual-based federated learning framework (ResFed), where residuals rather than model parameters are transmitted in communication networks for training. In particular, we integrate two pairs of shared predictors for the model prediction in both server-to-client and client-to-server communication. By employing a common prediction rule, both locally and globally updated models are always fully recoverable in clients and the server. We highlight that the residuals only indicate the quasi-update of a model in a single inter-round, and hence contain more dense information and have a lower entropy than the model, comparing to model weights and gradients. Based on this property, we further conduct lossy compression of the residuals by sparsification and quantization and encode them for efficient communication. The experimental evaluation shows that our ResFed needs remarkably less communication costs and achieves better accuracy by leveraging less sensitive residuals, compared to standard federated learning. For instance, to train a 4.08 MB CNN model on CIFAR-10 with 10 clients under non-independent and identically distributed (Non-IID) setting, our approach achieves a compression ratio over 700X in each communication round with minimum impact on the accuracy. To reach an accuracy of 70%, it saves around 99% of the total communication volume from 587.61 Mb to 6.79 Mb in up-streaming and to 4.61 Mb in down-streaming on average for all clients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题