Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular iterative algorithm used to train machine learning models while formally guaranteeing the privacy of users. However, the privacy analysis of DP-SGD makes the unrealistic assumption that all intermediate iterates (aka internal state) of the algorithm are released since, in practice, only the final trained model, i.e., the final iterate of the algorithm is released. In this hidden state setting, prior work has provided tighter analyses, albeit only when the loss function is constrained, e.g., strongly convex and smooth or linear. On the other hand, the privacy leakage observed empirically from hidden state DP-SGD, even when using non-convex loss functions, suggests that there is in fact a gap between the theoretical privacy analysis and the privacy guarantees achieved in practice. Therefore, it remains an open question whether hidden state privacy amplification for DP-SGD is possible for all (possibly non-convex) loss functions in general. In this work, we design a counter-example and show, both theoretically and empirically, that a hidden state privacy amplification result for DP-SGD for all loss functions in general is not possible. By carefully constructing a loss function for DP-SGD, we show that for specific loss functions, the final iterate of DP-SGD alone leaks as much information as the sequence of all iterates combined. Furthermore, we empirically verify this result by evaluating the privacy leakage from the final iterate of DP-SGD with our loss function and show that this exactly matches the theoretical upper bound guaranteed by DP. Therefore, we show that the current privacy analysis for DP-SGD is tight for general loss functions and conclude that no privacy amplification is possible for DP-SGD in general for all (possibly non-convex) loss functions.
翻译:暂无翻译