The main success stories of deep learning, starting with ImageNet, depend on deep convolutional networks, which on certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines, and also better than deep fully connected networks; but what is so special about deep convolutional networks? Recent results in approximation theory proved an exponential advantage of deep convolutional networks with or without shared weights in approximating functions with hierarchical locality in their compositional structure. More recently, the hierarchical structure was proved to be hard to learn from data, suggesting that it is a powerful prior embedded in the architecture of the network. These mathematical results, however, do not say which real-life tasks correspond to input-output functions with hierarchical locality. To evaluate this, we consider a set of visual tasks where we disrupt the local organization of images via "deterministic scrambling" to later perform a visual task on these images structurally-altered in the same way for training and testing. For object recognition we find, as expected, that scrambling does not affect the performance of shallow or deep fully connected networks contrary to the out-performance of convolutional networks. Not all tasks involving images are however affected. Texture perception and global color estimation are much less sensitive to deterministic scrambling showing that the underlying functions corresponding to these tasks are not hierarchically local; and also counter-intuitively showing that these tasks are better approximated by networks that are not deep (texture) nor convolutional (color). Altogether, these results shed light into the importance of matching a network architecture with its embedded prior of the task to be learned.
翻译:从图像网络开始的深层次学习的主要成功故事取决于深层革命网络,这些网络在某些任务上的表现比传统的浅层分类器(如支持矢量机器)要好得多,也比更深层完全连接的网络要好得多;但是,深层革命网络有什么特别之处?近似理论的最近结果证明深层革命网络的指数优势,无论是否具有与结构结构结构中等级位置相近的相似功能的分权。最近,等级结构被证明很难从数据中学习,表明它以前嵌入网络结构中是一个强大的先行。然而,这些数学结果并没有说明真实生活任务与等级化地点的输入输出功能相对应。为了评估这一点,我们考虑了一系列视觉任务,我们在这里通过“定时倾斜”破坏当地图像的组织结构,然后以同样的方式对这些图像进行结构变异端。我们发现,对于深层次结构的变异性并不影响浅或深层连接网络的运行,但是,这些直径直线任务与前结构的变异性功能并不影响。