Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
翻译:现代有线电视新闻网正在学习大量革命操作员的重量。 在本文中,我们提出一个根本问题,如果这确实有必要的话。我们表明,即使在仅随机初始化和从未更新空间过滤器的极端情况下,某些有线电视新闻网架构也可以接受培训,以超过标准培训的准确性。通过重新解释点化(1美元1美元)概念,作为操作员来学习冷冻(随机)空间过滤器的线性组合(LC),我们可以分析这些效应,并提出一个通用的LC convolution块,以调整线性组合速率。我们生动地表明,这一方法不仅使我们能够在CIFAR和图像网络上达到高测试的精度,而且在模型稳健性、一般化、宽度和必要重量总数方面也具有有利的特性。此外,我们提议了一个新的权重共享机制,允许在所有空间革命层之间共享一个单一的重量阀,以大规模减少重量数量。