Fisheye cameras suffer from image distortion while having a large field of view(LFOV). And this fact leads to poor performance on some fisheye vision tasks. One of the solutions is to optimize the current vision algorithm for fisheye images. However, most of the CNN-based methods and the Transformer-based methods lack the capability of leveraging distortion information efficiently. In this work, we propose a novel patch embedding method called Sector Patch Embedding(SPE), conforming to the distortion pattern of the fisheye image. Furthermore, we put forward a synthetic fisheye dataset based on the ImageNet-1K and explore the performance of several Transformer models on the dataset. The classification top-1 accuracy of ViT and PVT is improved by 0.75% and 2.8% with SPE respectively. The experiments show that the proposed sector patch embedding method can better perceive distortion and extract features on the fisheye images. Our method can be easily adopted to other Transformer-based models. Source code is at https://github.com/IN2-ViAUn/Sector-Patch-Embedding.
翻译:摘要:鱼眼相机拥有大的视场角,但常常受到图像畸变的影响,导致鱼眼应用任务性能不佳。解决方法之一是优化鱼眼图像的视觉算法。然而,大多数基于卷积神经网络的方法和基于Transformer的方法都不具备有效地利用变形信息的能力。本次工作提出了一种新颖的图块嵌入方法,称为扇形图块嵌入(Sector Patch Embedding, SPE),可符合鱼眼图像畸变模式。此外,我们基于ImageNet-1K数据集提出了一种综合鱼眼数据集,并研究了几种Transformer模型在该数据集上的性能。使用SPE,ViT和PVT的分类top-1准确率分别提高了0.75%和2.8%。实验结果表明,本次提出的扇形图块嵌入方法在鱼眼图像上能够更好地感知畸变和提取特征。本方法可轻松应用于其他基于Transformer的模型。源代码位于https://github.com/IN2-ViAUn/Sector-Patch-Embedding。