Current state-of-the-art deep neural networks for image classification are made up of 10 - 100 million learnable weights and are therefore inherently prone to overfitting. The complexity of the weight count can be seen as a function of the number of channels, the spatial extent of the input and the number of layers of the network. Due to the use of convolutional layers the scaling of weight complexity is usually linear with regards to the resolution dimensions, but remains quadratic with respect to the number of channels. Active research in recent years in terms of using multigrid inspired ideas in deep neural networks have shown that on one hand a significant number of weights can be saved by appropriate weight sharing and on the other that a hierarchical structure in the channel dimension can improve the weight complexity to linear. In this work, we combine these multigrid ideas to introduce a joint framework of multigrid inspired architectures, that exploit multigrid structures in all relevant dimensions to achieve linear weight complexity scaling and drastically reduced weight counts. Our experiments show that this structured reduction in weight count is able to reduce overfitting and thus shows improved performance over state-of-the-art ResNet architectures on typical image classification benchmarks at lower network complexity.
翻译:目前最先进的用于图像分类的深层神经网络由1 000万至1亿个可学习的重量组成,因此自然容易过度适应。重量计数的复杂性可被视为由频道数量、输入的空间范围和网络层数层组成的函数。由于使用卷发层,重量复杂性的缩放通常与分辨率尺寸有关,但对于频道数量而言仍然是四面八方。近年来在利用深层神经网络中多电网启发的理念方面进行的积极研究表明,一方面,通过适当重量分享可以节省大量重量,另一方面,频道层面的等级结构可以提高线性重量的复杂性。在这项工作中,我们将这些多电网概念结合起来,引入一个多电网启发结构的联合框架,利用所有相关层面的多电网结构实现线性重量缩放和大幅减重计。我们的实验表明,这种结构性减肥能够减少超常的重量,从而显示在网络的典型图像分类基准方面,在网络中,超越了先进的ResNet结构的改进性能。