Convolutional neural networks have shown remarkable performance in recent years on various computer vision problems. However, the traditional convolutional neural network architecture lacks a critical property: shift equivariance and invariance, broken by downsampling and upsampling operations. Although data augmentation techniques can help the model learn the latter property empirically, a consistent and systematic way to achieve this goal is by designing downsampling and upsampling layers that theoretically guarantee these properties by construction. Adaptive Polyphase Sampling (APS) introduced the cornerstone for shift invariance, later extended to shift equivariance with Learnable Polyphase up/downsampling (LPS) applied to real-valued neural networks. In this paper, we extend the work on LPS to complex-valued neural networks both from a theoretical perspective and with a novel building block of a projection layer from $\mathbb{C}$ to $\mathbb{R}$ before the Gumbel Softmax. We finally evaluate this extension on several computer vision problems, specifically for either the invariance property in classification tasks or the equivariance property in both reconstruction and semantic segmentation problems, using polarimetric Synthetic Aperture Radar images.
翻译:近年来,卷积神经网络在各种计算机视觉问题上展现出卓越性能。然而,传统卷积神经网络架构缺乏一个关键特性:平移等变性与平移不变性,这一特性因下采样和上采样操作而被破坏。尽管数据增强技术可帮助模型从经验上学习后一特性,但实现该目标的一致且系统化方法是通过设计下采样和上采样层,从理论上保证这些特性在结构层面得以满足。自适应多相采样(APS)为平移不变性奠定了基石,随后通过可学习多相上/下采样(LPS)扩展至平移等变性,并应用于实值神经网络。本文从理论角度将LPS工作扩展至复数神经网络,并提出一种新颖的构建模块:在Gumbel Softmax前添加从$\\mathbb{C}$到$\\mathbb{R}$的投影层。最终,我们通过极化合成孔径雷达图像,在多个计算机视觉问题上评估该扩展方案,特别针对分类任务中的不变性特性,以及重建与语义分割问题中的等变性特性。