In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
翻译:在许多情形下,更简单的模型比更复杂的模型更可取,而控制这种模型复杂性是许多机器学习方法的目标,例如正规化、超参数调制和建筑设计。在深层学习中,很难理解复杂控制的基本机制,因为许多传统措施并非自然适合深层神经网络。我们在这里发展了几何复杂性概念,这是衡量模型功能变化的一种尺度,它使用离散的二分浓缩能源计算。我们结合理论理论和实证结果,表明许多共同的培训超常学学学,例如参数规范规范规范化、光谱规范规范化、平板规范化、隐含梯度规范化、噪音规范化以及参数初始化的选择,所有这些都是为了控制几何复杂性,提供了一个统一框架,用以描述深层学习模式的行为。