We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work [WXDX20, KLZ22], we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm for smooth losses that runs in linear time and has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
翻译:我们研究的是不同私人(DP) 随机优化(SO), 损失函数中所有数据点上最差的利普西茨参数可能非常大。 到目前为止, DP SO的绝大多数工作假设损失是完全的利普西茨(Lipschitz) 相对于数据点的连续性( 即, 随机梯度梯度( 统合于所有数据点上) 。 虽然这个假设很方便, 但往往会导致悲观性超风险。 在许多实际问题中, 在所有数据点上损失的最差的利普西茨( Lipschitiz) 参数可能非常大。 在这种情况下, DP SO的错误界限( 以损失最差的利普西茨参数衡量) 。 与最差的利普西兹( Rickix) 功能相比, 与最差的利普利克斯( Ricklix) 值值值相比, 以最短的比值比值值值值值值值值值( ) 。