We study differentially private (DP) stochastic optimization (SO) with data containing outliers and loss functions that are not Lipschitz continuous. To date, the vast majority of work on DP SO assumes that the loss is Lipschitz (i.e. stochastic gradients are uniformly bounded), and their error bounds scale with the Lipschitz parameter of the loss. While this assumption is convenient, it is often unrealistic: in many practical problems where privacy is required, data may contain outliers or be unbounded, causing some stochastic gradients to have large norm. In such cases, the Lipschitz parameter may be prohibitively large, leading to vacuous excess risk bounds. Thus, building on a recent line of work [WXDX20, KLZ22], we make the weaker assumption that stochastic gradients have bounded $k$-th moments for some $k \geq 2$. Compared with works on DP Lipschitz SO, our excess risk scales with the $k$-th moment bound instead of the Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). Moreover, in contrast to the prior works [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm that runs in linear time and yields improved (compared to prior works) and nearly optimal excess risk for smooth losses. Additionally, our work is the first to address non-convex non-Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some classes of neural nets, among other practical models. Our Proximal-PL algorithm has nearly optimal excess risk that almost matches the strongly convex lower bound. Lastly, we provide shuffle DP variations of our algorithms, which do not require a trusted curator (e.g. for distributed learning).
翻译:我们用包含超值和损失函数的数据来研究差异私人(DP)随机优化(SO),这些数据包含超值和损失函数,但并非Lipschitz的连续性。到目前,DP SO的绝大多数工作假设损失是Lipschitz(即随机梯度梯度统一捆绑 ), 其错误与Lipschitz损失参数的界限尺度。 虽然这一假设很方便,但它往往不切实际:在许多需要隐私的实际问题中,数据可能包含超值和损失函数,从而导致某些超值梯度梯度的规范。在这种情况下,Lipschitz的超值参数可能非常大,导致无风险圈的超值。因此,在最新的工作线上[WXX20、KLZ22],我们更弱的梯度梯度梯度梯度连接到美元2美元的其他瞬间。 与DP Lipschitzitztri SOO相比, 我们的超值风险尺度与美元比起来,而不是利普斯卡茨的比值值值值值值值, 几乎是超值的, 超值的超值运值值值值值值值值值值值值值值值值值值函数, 将比值驱动值的机值,让更值发生更值损失值。