Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.
翻译:为现代神经网络提供普遍化保障是统计学习中的一项关键任务。最近,一些研究试图通过使用分形几何学工具分析这些环境中的一般化错误。这些研究成功地引进了新的数学工具来捕捉一般化,但它们在很大程度上依赖于Lipschitz连续性假设,而Lipschitz的连续性假设一般并不支持神经网络,而且可能会使界限空洞。在这项工作中,我们处理这一问题,并证明以分形几何为基础的一般化界限而无需任何Lipschitz假设。为了实现这一目标,我们利用了一种典型的理论理论理论理论覆盖的理论理论,并引入了一个依赖数据的分形维度。尽管引入了大量的技术复杂因素,但这一新的概念让我们能够控制一般化错误(固定或随机的假设空间)以及某些相互信息(MI)术语。为了对新引入的MI术语作出更清楚的解释,作为下一步,我们引入了一个“几何稳定性”的概念,并将我们的界限与先前的艺术联系起来。最后,我们把拟议的数据依赖性层面和基于数据依赖性的分形理论的层面与我们进行的各种数字分析的方法紧密连接起来。