We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the ``effective smoothness'' of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic ``worst-case'' H\"older constant. We prove nearly tight upper and lower risk bounds in terms of the average H\"older smoothness, establishing the minimax rate in the realizable regression setting up to log factors; this was not previously known even in the special case of average Lipschitz smoothness. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown sampling distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide a learning algorithm that achieves the (nearly) optimal learning rate. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.
翻译:我们将Ashlagi等人(COLT 2021)提出的平均Lipschitz平稳概念(COLT 2021) 推广到H\"older slunity"。 这种“有效顺畅”功能的度量对于基本分布很敏感,并且可能大大小于其经典的“worst-case'' H\'older older stand. ” 。 我们用平均 H\\'older 的顺畅度来证明我们几乎紧紧的上下风险界限,在可实现回归到日志要素的可变回归设置中建立了微缩速率; 即使在普通的Lipschitz 顺畅的特殊情况下, 这一点以前也并不为人所知。 从算法的角度来看, 我们的“ 有效顺畅度” 函数概念与未知的抽样分布有关, 学习者对功能类别没有明确的表示, 因此无法实施机构化。 然而, 我们提供了一种( 近于) 最佳学习率的学习算法。 我们的结果在任何完全封闭的测量空间中都保留着, 并且以其内在几何测量方式表示。 总之, 我们的结果显示, H\old shold salliver sleving ass pald passing resglemultlemultalmultaldsmuldly lemuld) lemuptald.