自动剪裁：差异私人的深度学习使更容易，更强大

论文标题

自动剪裁：差异私人的深度学习使更容易，更强大

Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger

论文作者

Bu, Zhiqi, Wang, Yu-Xiang, Zha, Sheng, Karypis, George

论文摘要

每个例子梯度剪辑是一个关键的算法步骤，可实现对深度学习模型的实用差异私有（DP）培训。但是，剪接阈值R的选择对于在DP下实现高精度至关重要。我们提出了一个易于使用的替换，称为自动剪辑，这消除了为任何DP优化器（包括DP-SGD，DP-ADAM，DP-LAMB等）调整R的需求。自动变体与现有的DP优化器一样私有和计算效率，但不需要DP特异性的超参数，因此可以使DP培训与标准的非私人培训一样适合。我们在非凸面设置中对自动DP-SGD进行了严格的融合分析，这表明它可以享受与标准SGD相匹配的渐近收敛速率，在对称梯度噪声的假设（通常用于非DP文献中）。我们在各种语言和视觉任务上演示，这些任务自动剪辑优于或与最先进的任务相匹配，并且可以轻松使用对现有代码库的最小更改。

Per-example gradient clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models. The choice of clipping threshold R, however, is vital for achieving high accuracy under DP. We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DP optimizers, including DP-SGD, DP-Adam, DP-LAMB and many others. The automatic variants are as private and computationally efficient as existing DP optimizers, but require no DP-specific hyperparameters and thus make DP training as amenable as the standard non-private training. We give a rigorous convergence analysis of automatic DP-SGD in the non-convex setting, showing that it can enjoy an asymptotic convergence rate that matches the standard SGD, under a symmetric gradient noise assumption of the per-sample gradients (commonly used in the non-DP literature). We demonstrate on various language and vision tasks that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题