Using convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360images. Using the proposed transformer for 360images has two advantages. First, our method does not require the erroneous planar projection process by sampling pixels from the sphere surface. Second, our sampling method based on regular polyhedrons makes low rotation equivariance errors, because specific rotations can be reduced to permutations of faces. In experiments, we validate our network on two aspects, as follows. First, we show that using a transformer with highly uniform sampling methods can help reduce the distortion. Second, we demonstrate that the transformer architecture can achieve rotation equivariance on specific rotations. We compare our method to other state-of-the-art algorithms using the SPH-MNIST, SPH-CIFAR, and SUN360 datasets and show that our method is competitive with other methods.
翻译:使用360image 的革命神经网络, 360images 可以通过平面投影造成扭曲, 诱发次优的性能。 当对360image 应用旋转时, 扭曲会恶化。 因此, 许多基于卷变的研究试图减少扭曲以获得准确的表达方式。 相反, 我们利用变压器结构来解决360image 的图像分类问题。 使用360image 的拟议变压器有两个优点。 首先, 我们的方法不需要通过从球体表面采样像素来进行错误的平面投影。 其次, 我们基于常规多面图的采样方法会产生低旋转等差错误, 因为特定的旋转可以降低为面部的变相。 在实验中, 我们验证我们的网络有两个方面。 首先, 我们显示使用高度统一的采样方法使用变压器可以减少扭曲。 其次, 我们证明变压器结构可以在特定的旋转时实现旋转不均匀。 我们用SPH- MNIST、 SPH- CIFAR、 SUN360 和 SUN 显示我们的其他有竞争力的方法。