We study the excess capacity of deep networks in the context of supervised classification. That is, given a capacity measure of the underlying hypothesis class -- in our case, Rademacher complexity -- how much can we (a-priori) constrain this class while maintaining an empirical error comparable to the unconstrained setting. To assess excess capacity in modern architectures, we first extend an existing generalization bound to accommodate function composition and addition, as well as the specific structure of convolutions. This then facilitates studying residual networks through the lens of the accompanying capacity measure. The key quantities driving this measure are the Lipschitz constants of the layers and the (2,1) group norm distance to the initializations of the convolution weights. We show that these quantities (1) can be kept surprisingly small and, (2) since excess capacity unexpectedly increases with task difficulty, this points towards an unnecessarily large capacity of unconstrained models.
翻译:我们从监督分类的角度研究深层网络的过剩能力。也就是说,根据对基本假设等级 -- -- 就我们而言,雷德马赫公司的复杂性 -- -- 的能力量度,我们(优先)能在多大程度上约束这一类别,同时保持一个与不受限制的环境相比的经验错误。为了评估现代建筑的过剩能力,我们首先扩展现有的概括化,以适应功能构成和增加,以及演化的具体结构。这有利于通过附带能力计量的透镜研究剩余网络。推动这一计量的关键数量是层的利普施茨常数和(2,1)组规范距离共生权重初始的距离。我们表明,这些数量(1)可以保持出乎意料地小,(2)由于任务困难,超能力会意外增加,这导致不必要地大量未受限制的模式。