Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Recently, it has been shown that the trajectories of iterative optimization algorithms can possess fractal structures, and their generalization error can be formally linked to the complexity of such fractals. This complexity is measured by the fractal's intrinsic dimension, a quantity usually much smaller than the number of parameters in the network. Even though this perspective provides an explanation for why overparametrized networks would not overfit, computing the intrinsic dimension (e.g., for monitoring generalization during training) is a notoriously difficult task, where existing methods typically fail even in moderate ambient dimensions. In this study, we consider this problem from the lens of topological data analysis (TDA) and develop a generic computational tool that is built on rigorous mathematical foundations. By making a novel connection between learning theory and TDA, we first illustrate that the generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD), where, compared with prior work, our approach does not require any additional geometrical or statistical assumptions on the training dynamics. Then, by utilizing recently established theoretical results and TDA tools, we develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks and further provide visualization tools to help understand generalization in deep learning. Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings, which is predictive of the generalization error.
翻译:尽管现代深层神经网络通常包含数以百万计的参数,但尽管它们通常包含着数以百万计的参数,但是,现代深层神经网络却普遍地概括了典型的统计学习理论的智慧。 最近,人们已经表明,迭代优化算法的轨迹可以拥有分形结构,而其概括性错误可以正式地与这种分形结构的复杂性相联系。这种复杂性是通过分形的内在维度来测量的,其数量通常比网络中参数的数量要小得多。尽管这一视角可以解释为什么超分化网络的设置不会过度适应,而计算内在维度(例如,用于在培训期间监测一般化)是一个臭名昭著的困难任务,在这种任务中,现有方法即使在中等的环境维度上也会失败。在本研究中,我们从表面数据分析(TDA)的角度来考虑这一问题,并开发一个建立在严格数学基础之上的通用计算工具。通过将学习理论和TDA的精度联系,我们首先可以说明,从一个称为“渗透性共性维度”的概念(PHD),在内部网络中是计算出一个已知的精度的精度的精度,在后来的精确的精确的计算方法中可以提供一种我们以往的精确的精确的精确的精确的模型,而后在任何的模型的模型的模型的模型的模型的模型上显示。