It has previously been observed that the filters learned in the first layer of a CNN are qualitatively similar for different networks and tasks. We extend this finding and show a high quantitative similarity between filters learned by different networks. We consider the CNN filters as a filter bank and measure the sensitivity of the filter bank to different frequencies. We show that the sensitivity profile of different networks is almost identical, yet far from initialization. Remarkably, we show that it remains the same even when the network is trained with random labels. To understand this effect, we derive an analytic formula for the sensitivity of the filters in the first layer of a linear CNN. We prove that when the average patch in images of the two classes is identical, the sensitivity profile of the filters in the first layer will be identical in expectation when using the true labels or random labels and will only depend on the second-order statistics of image patches. We empirically demonstrate that the average patch assumption holds for realistic datasets. Finally we show that the energy profile of filters in nonlinear CNNs is highly correlated with the energy profile of linear CNNs and that our analysis of linear networks allows us to predict when representations learned by state-of-the-art networks trained on benchmark classification tasks will depend on the labels.
翻译:以前曾观察到,CNN第一层所学的过滤器在质量上与不同的网络和任务相似。我们扩展了这一发现,并显示不同网络所学过滤器之间的高度数量相似性。我们把CNN过滤器视为过滤器库,测量过滤器库对不同频率的敏感度。我们显示,不同网络的敏感度几乎相同,但远未初始化。值得注意的是,我们显示,即使在网络经过随机标签培训时,它仍然相同。为了理解这一效果,我们为线性CNN第一层过滤器的敏感度得出了一个分析公式。我们证明,当这两层图像的平均补丁相同时,第一层过滤器的敏感度在使用真实标签或随机标签时会达到同样的预期,而且只取决于图像补丁的第二顺序统计。我们从经验上证明,现实的数据集所持的平均补丁假设是相同的。我们最后显示,非线性CNN的过滤器的能源状况与线性CNN的能源状况非常相近,我们证明,我们所培训的线性网络分析将使我们能够预测在所学的标签上进行基准分析。