Convolution utilizes a shift-equivalent prior of images, thus leading to great success in image processing tasks. However, commonly used poolings in convolutional neural networks (CNNs), such as max-pooling, average-pooling, and strided-convolution, are not shift-equivalent. Thus, the shift-equivalence of CNNs is destroyed when convolutions and poolings are stacked. Moreover, anti-aliasing is another essential property of poolings from the perspective of signal processing. However, recent poolings are neither shift-equivalent nor anti-aliasing. To address this issue, we propose a new pooling method that is shift-equivalent and anti-aliasing, named frequency pooling. Frequency pooling first transforms the features into the frequency domain, and then removes the frequency components beyond the Nyquist frequency. Finally, it transforms the features back to the spatial domain. We prove that frequency pooling is shift-equivalent and anti-aliasing based on the property of Fourier transform and Nyquist frequency. Experiments on image classification show that frequency pooling improves accuracy and robustness with respect to the shifts of CNNs.
翻译:革命利用了图像的变换等值前端, 从而在图像处理任务中取得了巨大成功。 然而, 革命神经网络中常用的集合( CNNs), 如最大集合、 平均集合和累进进式等同, 并不等于变化。 因此, CNN的变换等值当堆叠时就会被摧毁。 此外, 从信号处理的角度看, 反反反反反反射是集合的另一个基本属性。 但是, 最近的集合既不是变换等值, 也不是反反反反反反化。 为了解决这个问题, 我们提议一种新的合并方法, 即最大等值和反变换, 命名频率集合。 频率集合首先将功能转换为频率域, 然后将频率组件移到Nyquist 频率频率以外。 最后, 将功能转换回空间域。 我们证明, 频率集合是基于 Fourier 变换换和 Nyquist 频率特性的变换和反变换。 图像分类实验显示, 频率集中能提高CNNyquist 的准确性和稳性。