Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a "discriminative" pattern needs to be detected among "spurious" patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.
翻译:Max-pooling 操作是深层学习架构的核心组成部分。 特别是, 它们是机器视觉中使用的大多数革命架构的一部分, 因为集合是处理模式探测问题的自然方法。 但是, 这些架构从理论角度并不十分理解。 例如, 我们不明白何时可以优化这些架构, 以及超分度化对一般化的影响。 我们在这里对一个革命最大集合架构进行理论分析, 证明它可以在全球优化, 甚至可以对高度超分度模型进行广泛化。 我们的分析侧重于基于模式探测问题生成的分布数据, 需要从“ 纯洁” 模式中检测出“ 差异性” 模式。 我们从经验上证实CNN大大超越了我们环境中完全连接的网络, 正如我们的理论结果所预测的那样 。