We propose a novel antialiasing method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" ($\mathbb{R}$Max) by "complex-valued convolutions + modulus" ($\mathbb{C}$Mod), which is stable to translations. To justify our approach, we claim that $\mathbb{C}$Mod and $\mathbb{R}$Max produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, $\mathbb{C}$Mod can be considered as a stable alternative to $\mathbb{R}$Max. Thus, prior to antialiasing, we force the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our antialiasing approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.
翻译:我们提出了一种新的去混叠方法,以增加卷积神经网络的平移不变性和预测准确性。具体地,我们使用“复合卷积+模数”($\mathbb{C}$Mod)代替第一层组合“实值卷积+最大池化”($\mathbb{R}$Max),$\mathbb{C}$Mod 在平移时是稳定的。为了证明我们的方法,我们声称当卷积核为带通和方向性时(Gabor样式滤波器),$\mathbb{C}$Mod 和 $\mathbb{R}$Max 产生相似的输出。在这种情况下,$\mathbb{C}$Mod 可以被认为是 $\mathbb{R}$Max 的一个稳定替代品。因此,在去混叠之前,我们强制卷积核采用这样的 Gabor 样式结构。相应的架构称为数学双,因为它使用一个明确定义的数学运算符来模拟原始的自由训练模型的行为。我们的去混叠方法在 ImageNet 和 CIFAR-10 分类任务上实现了比基于低通滤波的先前方法更优越的准确性。可以认为,我们的方法强调保留高频细节,有助于在平移不变性和信息保留之间取得更好的平衡,从而提高性能。此外,它的计算成本和内存占用比并行工作更低,使其成为实际实现的有前途的解决方案。