The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondr\'{a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order $\Theta(1/\sqrt{n})$. When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term $\Theta(1/\sqrt{n})$ can be avoided, and high probability excess risk bounds of order up to $O(1/n)$ are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate $O(\log n/n)$ for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how $O(\log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
翻译:对于统一稳定的算法(Feldman, Vondr\'a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) 最广为人知的概率高的通用参数, 包含一个一般不可避免的抽样错误期 $theta (1/\\ sqrt{n}) 美元 。 当应用到超风险范围时, 这导致一些标准的吸食性锥体优化问题, 导致低于最优结果。 我们显示, 如果满足所谓的伯恩斯坦条件, $(1/\ sqrt{n} ) 的术语可以避免, 并且通过统一稳定性, 极有可能出现高概率的超值风险。 使用这个结果, 我们显示出一个高概率的超值风险, 美元( log n/ n/ nn) 和利普施维茨损失对于 emph{a} 实验风险最小化方法有效。 这解决了Shalev- Swartz,, Shamir, Shambro, Srebro, Sretran, 和斯里沙达氏 假设的高度概率风险 。