A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the net. This loss of sensitivity correlates with performance, and surprisingly correlates with a gain of sensitivity to white noise acquired during training. These facts are unexplained, and as we demonstrate still hold when white noise is added to the images of the training set. Here, we (i) show empirically for various architectures that stability to image diffeomorphisms is achieved by spatial pooling in the first half of the net, and by channel pooling in the second half, (ii) introduce a scale-detection task for a simple model of data where pooling is learned during training, which captures all empirical observations above and (iii) compute in this model how stability to diffeomorphisms and noise scale with depth. The scalings are found to depend on the presence of strides in the net architecture. We find that the increased sensitivity to noise is due to the perturbing noise piling up during pooling, after being rectified by ReLU units.
翻译:机器学习的中心问题是,深网如何在高维度上学习任务。一个引人入胜的假设是,它们通过建立与任务无关的信息丢失的数据的表达方式取得了这一成就。对于图像数据集,这种观点得到以下观察的支持:在(而不是在之前)培训之后,神经代表对于在图像上作为信号通过网络传播的信号而作用的异己现象越来越不敏感。这种敏感度的丧失与性能有关,与培训期间获得的白色噪音的敏感度有惊人的联系。这些事实是无法解释的,而且当将白色噪音添加到成套培训的图像中时,我们仍能保持这种成就。在这里,我们(一)从经验上显示,各种结构在图像变异形方面,稳定是通过空间汇集在网络前半部实现的,而通过在后半部的渠道汇集而实现的。 (二)引入一个规模探测任务,用于一个简单的数据模型,在培训期间学习集中,从而捕捉到所有经验性观测结果。 (三)在这个模型中,这些事实是无法解释的稳定性是如何维持的。在这里,我们(一)用实验性声音显示,在网络变化和噪音规模上如何稳定,在深度后,我们发现,在深度中发现,在加速度上是如何发现,在结构中如何测量。