Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no simple relationship between input size and model performance (no `bigger is better'), but that each each network has a preferred input size, for which it shows best results. We investigate this phenomenon by applying different methods, including spectral analysis of layer activations and probe classifiers, showing that there are characteristic features depending on the network architecture. From this we find that the size of discriminatory features is critically influencing how the inference process is distributed among the layers.
翻译:完全进化的神经网络可以通过应用下层抽样和集合组合处理任意大小的输入。 然而,我们发现完全进化的图像分类器对输入大小不是不可知的,而是显示显著的性能差异:在不同尺度上展示相同的图像可以产生不同的结果。 仔细看就会发现,输入大小和模型性能之间没有简单的关系( 没有“ 跳跃更好 ” ), 但每个网络都有首选的输入大小, 并显示出最佳效果。 我们通过采用不同的方法, 包括分层激活的光谱分析和探测分类器, 来调查这一现象, 表明存在取决于网络结构的特征。 我们从这一点中发现, 歧视特征的规模正在严重影响着各个层次之间如何分配推论过程。