论文标题
广义Lagrange编码计算:用于弹性,安全和私人计算的灵活计算 - 通信权衡
Generalized Lagrange Coded Computing: A Flexible Computation-Communication Tradeoff for Resilient, Secure, and Private Computation
论文作者
论文摘要
我们考虑在具有多个输入的大量数据集上评估任意多元多项式的问题,该数据集是在具有主节点和多个Worker节点的分布式计算系统上。提出了广义的Lagrange编码计算(GLCC)代码,以同时为不返回计算结果的Stragglers提供弹性,对对福利的结果进行了对对抗性工人的保障,并为其受益的信息理论私密性,以及数据集的信息理论隐私。 GLCC代码是通过将数据集首先将数据集划分为多个组来构建的,然后使用精心设计的插值多项式编码数据集,并向每个工作人员共享多个编码数据点,因此可以在主人的主人中消除各组的干扰计算结果。特别是,GLCC代码包括最先进的Lagrange编码计算(LCC)代码作为一种特殊情况,并在优化系统效率方面在通信和计算开销之间表现出更灵活的权衡。此外,我们将GLCC应用于机器学习模型的分布式培训,并证明GLCC代码的速度达到了$ 2.5 \ text { - } 3.9 \ times $ by LCC代码的培训时间,跨培训时间,用于在不同数据集,模型体系结构和Straggler模式上进行培训图像分类器的实验。
We consider the problem of evaluating arbitrary multivariate polynomials over a massive dataset containing multiple inputs, on a distributed computing system with a master node and multiple worker nodes. Generalized Lagrange Coded Computing (GLCC) codes are proposed to simultaneously provide resiliency against stragglers who do not return computation results in time, security against adversarial workers who deliberately modify results for their benefit, and information-theoretic privacy of the dataset amidst possible collusion of workers. GLCC codes are constructed by first partitioning the dataset into multiple groups, then encoding the dataset using carefully designed interpolating polynomials, and sharing multiple encoded data points to each worker, such that interference computation results across groups can be eliminated at the master. Particularly, GLCC codes include the state-of-the-art Lagrange Coded Computing (LCC) codes as a special case, and exhibit a more flexible tradeoff between communication and computation overheads in optimizing system efficiency. Furthermore, we apply GLCC to distributed training of machine learning models, and demonstrate that GLCC codes achieve a speedup of up to $2.5\text{--}3.9\times$ over LCC codes in training time, across experiments for training image classifiers on different datasets, model architectures, and straggler patterns.