Convolutional neural networks typically contain several downsampling operators, such as strided convolutions or pooling layers, that progressively reduce the resolution of intermediate representations. This provides some shift-invariance while reducing the computational complexity of the whole architecture. A critical hyperparameter of such layers is their stride: the integer factor of downsampling. As strides are not differentiable, finding the best configuration either requires cross-validation or discrete optimization (e.g. architecture search), which rapidly become prohibitive as the search space grows exponentially with the number of downsampling layers. Hence, exploring this search space by gradient descent would allow finding better configurations at a lower computational cost. This work introduces DiffStride, the first downsampling layer with learnable strides. Our layer learns the size of a cropping mask in the Fourier domain, that effectively performs resizing in a differentiable way. Experiments on audio and image classification show the generality and effectiveness of our solution: we use DiffStride as a drop-in replacement to standard downsampling layers and outperform them. In particular, we show that introducing our layer into a ResNet-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and ImageNet even when training starts from poor random stride configurations. Moreover, formulating strides as learnable variables allows us to introduce a regularization term that controls the computational complexity of the architecture. We show how this regularization allows trading off accuracy for efficiency on ImageNet.
翻译:革命性神经网络通常包含几个下游抽样操作员, 如 螺旋式变换或集合层, 从而逐渐减少中间表示的分辨率。 这样可以提供某些变换偏差, 同时降低整个结构的计算复杂性 。 这种层的临界超参数是其步调 : 下游的整形因素 。 由于进展不尽相同, 找到最佳配置需要交叉校验或离散优化( 如建筑搜索) 。 随着搜索空间随着下游层的增加而成倍增长, 搜索空间会变得令人窒息。 因此, 通过梯度下降来探索这个搜索空间, 就可以以较低的计算成本找到更好的配置 。 这项工作引入了 DiffStride, 第一个下游层是它们可以学习的步调: 下游镜在 Fourier 域中, 以不同的方式有效地执行重新定位。 在音频和图像分类实验中显示我们解决方案的普遍性和有效性: 我们使用 DiffSride 将一个低位替换到标准下游的平流层, 将一个高阶的平流的平流结构 显示在构建中, 。 在不断显示我们从标准的平流的平流的平整的平流结构上, 。 显示一个高阶的平流 。