Our theoretical understanding of the inner workings of general convolutional neural networks (CNN) is limited. We here present a new stepping stone towards such understanding in the form of a theory of learning in linear CNNs. By analyzing the gradient descent equations, we discover that using convolutions leads to a mismatch between the dataset structure and the network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, stage-like transitions, and that the speed of discovery changes depending on this structural mismatch. Moreover, we find that the mismatch lies at the heart of what we call the 'dominant frequency bias', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. Our findings can help explain several characteristics of general CNNs, such as their shortcut learning and their tendency to rely on texture instead of shape.
翻译:我们对一般进化神经网络(CNN)内部运行的理论理解有限。 我们在这里展示了一种以线性CNN学习理论的形式实现这种理解的新的踏脚石。 通过分析梯度下降方程式,我们发现,使用递增导致数据集结构和网络结构之间的不匹配。 我们显示,线性CNN发现数据集的统计结构是非线性、类似阶段的过渡,发现速度的变化取决于这种结构不匹配。 此外,我们发现,这种不匹配是我们所谓的“主导频率偏差”的核心,即线性CNN仅利用数据集中不同结构部分的主导频率才发现这些发现。 我们的发现有助于解释普通CNN的一些特征,如他们的捷径学习和依赖纹理而不是形状的趋势。</s>