Convolutional neural networks have established themselves over the past years as the state of the art method for image classification, and for many datasets, they even surpass humans in categorizing images. Unfortunately, the same architectures perform much worse when they have to compare parts of an image to each other to correctly classify this image. Until now, no well-formed theoretical argument has been presented to explain this deficiency. In this paper, we will argue that convolutional layers are of little use for such problems, since comparison tasks are global by nature, but convolutional layers are local by design. We will use this insight to reformulate a comparison task into a sorting task and use findings on sorting networks to propose a lower bound for the number of parameters a neural network needs to solve comparison tasks in a generalizable way. We will use this lower bound to argue that attention, as well as iterative/recurrent processing, is needed to prevent a combinatorial explosion.
翻译:过去几年来,进化神经网络已经建立起来,作为图像分类的先进方法,对于许多数据集来说,它们甚至超越了人类。 不幸的是,同样的结构在对图像进行分类时表现得更差得多,因为它们必须互相比较图像的某些部分,才能正确地对图像进行分类。直到现在,还没有提出完善的理论论据来解释这种缺陷。在本文中,我们将争辩说,进化层对于这些问题没有什么用处,因为比较任务是全球性的,而进化层则是局部的。我们将利用这种洞察力将比较任务重新配置为分类任务,并利用对网络分类的发现来提出更低的参数限制,而神经网络需要以一般的方式解决比较任务。我们将用这种更低的理论论据来论证需要注意,以及迭接/经常处理,以防止组合爆炸。