Convolutional neural networks (CNNs) have been widely used in various vision tasks, e.g. image classification, semantic segmentation, etc. Unfortunately, standard 2D CNNs are not well suited for spherical signals such as panorama images or spherical projections, as the sphere is an unstructured grid. In this paper, we present Spherical Transformer which can transform spherical signals into vectors that can be directly processed by standard CNNs such that many well-designed CNNs architectures can be reused across tasks and datasets by pretraining. To this end, the proposed method first uses locally structured sampling methods such as HEALPix to construct a transformer grid by using the information of spherical points and its adjacent points, and then transforms the spherical signals to the vectors through the grid. By building the Spherical Transformer module, we can use multiple CNN architectures directly. We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation. For 3D object classification, we further propose a rendering-based projection method to improve the performance and a rotational-equivariant model to improve the anti-rotation ability. Experimental results on three tasks show that our approach achieves superior performance over state-of-the-art methods.
翻译:革命神经网络(CNNs)被广泛用于各种视觉任务,例如图像分类、语义分割等。 不幸的是,标准的 2D CNN 并不完全适合全景图像或球状投影等球状信号,因为球体是一个没有结构的网格。 在本文中,我们展示了球形变异器,它可以将球状信号转换成可直接由标准CNN直接处理的矢量。这样,许多设计完善的CNN 结构可以通过预先训练在任务和数据集之间重新利用。为此,拟议方法首先使用HEALPix等本地结构抽样方法,通过使用球形点及其相邻点的信息来构建一个变异器网格,然后将球状信号转换成通过网格的矢量。通过建立球状变异器模块,我们可以直接使用多个CNNC架构。我们评估了我们关于球状MNIST识别、3D对象分类和反离子图像断层断层分割的任务的方法。 对于3D对象定位,我们提出了一种通过3D对象变换的性能测试方法,我们进一步提出一种通过实验性变换的性模型来显示我们的性结果。