The widespread success of convolutional neural networks may largely be attributed to their intrinsic property of translation equivariance. However, convolutions are not equivariant to variations in scale and fail to generalize to objects of different sizes. Despite recent advances in this field, it remains unclear how well current methods generalize to unobserved scales on real-world data and to what extent scale equivariance plays a role. To address this, we propose the novel Scaled and Translated Image Recognition (STIR) benchmark based on four different domains. Additionally, we introduce a new family of models that applies many re-scaled kernels with shared weights in parallel and then selects the most appropriate one. Our experimental results on STIR show that both the existing and proposed approaches can improve generalization across scales compared to standard convolutions. We also demonstrate that our family of models is able to generalize well towards larger scales and improve scale equivariance. Moreover, due to their unique design we can validate that kernel selection is consistent with input scale. Even so, none of the evaluated models maintain their performance for large differences in scale, demonstrating that a general understanding of how scale equivariance can improve generalization and robustness is still lacking.
翻译:进化神经网络的广泛成功在很大程度上可归因于其翻译不均的内在属性。然而,进化并非因规模变化而异,无法推广到不同大小的物体。尽管该领域最近有所进步,但目前采用的方法在现实世界数据中普遍采用未观测的尺度的情况仍然不清楚,以及变异的程度在多大程度上发挥了作用。为了解决这个问题,我们提议采用基于四个不同领域的新的缩放和翻版图像识别基准。此外,我们引入了一套新的模型,这些模型同时应用许多重新规模的内核,并同时使用共同的权重,然后选择最合适的内核。我们在科学、技术和革新研究领域的实验结果表明,现有的和拟议的方法都能够比标准演化改进跨尺度的普遍化。我们还表明,我们的模型大家庭能够把大尺度广泛推广到更大的规模,并改进规模的不均匀性。此外,由于它们的独特设计,我们可以确认内核选择与投入规模一致。即使如此,被评估的模型中没有一个能够保持其大规模差异的性,但缺乏总体理解,因此总体规模的改进是缺乏的。