Per-example gradient clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models. The choice of clipping norm $R$, however, is shown to be vital for achieving high accuracy under DP. We propose an easy-to-use replacement, called AutoClipping, that eliminates the need to tune $R$ for any DP optimizers, including DP-SGD, DP-Adam, DP-LAMB and many others. The automatic variants are as private and computationally efficient as existing DP optimizers, but require no DP-specific hyperparameters and thus make DP training as amenable as the standard non-private training. We give a rigorous convergence analysis of automatic DP-SGD in the non-convex setting, which shows that it enjoys an asymptotic convergence rate that matches the standard SGD. We also demonstrate on various language and vision tasks that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases.
翻译:典型的梯度剪裁是一个关键的算法步骤,它能为深层学习模式提供实用的差别私人(DP)培训。但是,选择剪裁规范($R)对于在DP下实现高精度至关重要。我们建议使用一个简单易用的替换方法,称为自动剪裁,这样就不需要为DP-SGD、DP-Adam、DP-Adam、DP-LAMB等任何DP优化器调制美元,包括DP-SGD、DP-Adam、DP-LAMB和其他许多优化器。自动变体与现有的DP优化器一样,既具有私密性,又具有计算效率,但不需要DP专用的超参数,因此使DP培训与标准的非私营培训一样易于操作。我们在非convex设置中对自动的DP-SGD进行了严格的趋同率分析,这表明它享有一种与标准SGD相一致的无症状的趋同率。我们还演示了各种语言和视觉任务,这些语言和视觉任务可以自动剪裁断,或符合现有艺术的状态,并且可以很容易被使用,对现有的代码库进行最低限度的改动。