Encoding the scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many computer vision tasks especially when dealing with multiscale inputs. We study, in this paper, a scaling-translation-equivariant (ST-equivariant) CNN with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve equivariance for the regular representation of the scaling-translation group ST . To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation, a property which is theoretically analyzed and empirically verified. Numerical experiments demonstrate that the proposed scaling-translation-equivariant network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size.
翻译:将规模信息明确纳入进化神经网络(CNN)所学的表达方式,对于许多计算机视觉任务都有好处,特别是在处理多级投入时。我们在本文件中研究的是,与空间和缩放组联合演变的有线电视新闻网(CNN)有线电视新闻网(CNN)的缩放翻译变异(ST-Quevariant),这已证明既充分又必要,对于使缩放翻译组ST的正常代表形式实现等同关系来说,也十分必要。为了减少模型复杂性和计算负担,我们将革命过滤器分解为两个预先固定的分解基,并将扩展变成低频组件。变换过滤器扩展的另一个好处是,变形表达方式的变形更加稳健,这是一种在理论上分析和经经验上核实的属性。数字实验表明,拟议的缩放-变异网络与分解的卷式过滤器(ScDCFNet)在多级图像分类和比普通CNNCM在缩小模型规模上更便于解释。