通过本地化 $\ varepsilon$- coups 的存储渐变源的通用弹道宽度 (Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers)

In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with $P$ pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by $O(\sqrt{(\log n\log(nP))/n})$, where $n$ is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and $K$-means clustering for both hard and soft label setups, improving the known state-of-the-art rates.

翻译：在本文中,我们为 SGD 轨迹提出了一种新的覆盖技术。这种本地化提供了一种以覆盖数测量的具体算法复杂性, 与导致指数尺寸依赖性的标准统一覆盖参数相比, 它可能具有维独立的基点, 与导致指数尺寸依赖性的标准统一参数形成对照。基于此本地化构造, 我们显示, 如果目标函数是小曲子的有限扰动, 以$P( $P) 块( 即非convex 和一般非mooth) 粗略的曲线和光滑函数, 则一般化错误可以由$( $( scrt{ (\log nlog (nP)/n}) ) 来上限被 $( log n\ log) log (n) ) 美元( n) ) 来约束, 而美元是数据样本的数量。特别是, 这个速率独立于维度, 不需要早期停止和衰减级级大小。最后, 我们在不同情况下使用这些结果, 并为多指数线型模型、多级支持矢控传机和以$K 手段组合组合组合, $- y- poke- y- un- poke- y- un- un- un- yle- un- un- un- un- un- un- un- me- unc- unt- uses g- luction- luction- luction- luction- luction- lution- a- latition- cution- s- a- p- a- latition- a- a- a- cument- tution- p- pre- pre- pre- pre- a- pre- a- p- p- p- p- p- p- p- p- p- p- tution- proc- p- p- p- p- p- p- pre- pre- pre- proc- p- pre- pro- s- pro- pro- pro- pro- pro- pro- pro- pro- pro- le- la- le- pro- pro-