Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed to the adversary. As a result, the R\'enyi DP bounds derived by such composition-based analyses linearly grow with the number of training epochs. When the internal state of the algorithm is hidden, we prove a converging privacy bound for noisy stochastic gradient descent (on strongly convex smooth loss functions). We show how to take advantage of privacy amplification by sub-sampling and randomized post-processing, and prove the dynamics of privacy bound for "shuffle and partition" and "sample without replacement" stochastic mini-batch gradient descent schemes. We prove that, in these settings, our privacy bound converges exponentially fast and is substantially smaller than the composition bounds, notably after a few number of training epochs. Thus, unless the DP algorithm converges fast, our privacy analysis shows that hidden state analysis can significantly amplify differential privacy.
翻译:对随机 SGD 算法进行不同隐私分析的先前工作依赖于构成理论, 隐含( 不现实的) 假设是将迭代算法的内部状态暴露给对手。 结果, 这种基于构成的分析所产生的 R\' enyi DP 界限随着培训时代的增多而线性地增长。 当算法的内部状态隐藏起来时, 我们证明是一种连接隐私的集合, 将杂乱的梯度下降( 以强烈的吸附平稳损失功能 ) 捆绑在一起。 因此, 我们展示了如何利用通过子抽样和随机处理后处理来放大隐私的动态, 并证明“ 洗涤和隔置” 和“ 不替换的抽样” 迷你梯级基下行方法的隐私约束动态。 我们证明, 在这些环境下, 我们的隐私界限快速地融合, 大大小于构成界限, 特别是经过几次培训后。 因此, 除非 DP 算法快速融合, 我们的隐私分析表明隐藏状态分析可以大大增强差异隐私。