The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.
翻译:在自然语言处理中,变压器结构的成功最近引起了计算机视觉领域的注意。变压器由于能够学习远程依赖性而被用来替代广泛使用的变压器操作者。这种更换证明在许多任务中是成功的,其中若干最先进的方法依靠变压器更好地学习。在计算机视野中,3D领域还看到3D领域更多地使用变压器处理3D神经网络和多层透视网络。虽然一些调查侧重于一般的视觉变压器,但3D愿景需要特别注意,因为与2D愿景相比,数据代表性和处理存在差异。在这项工作中,我们系统地彻底审查100多个变压器方法,用于不同的3D愿景任务,包括分类、分解、检测、完成、估算等等。我们讨论3D愿景的变压器设计,使其能够以各种3D图解方式处理数据。关于每项应用,我们强调拟议变压器方法的关键性质和贡献,因为与2D愿景相比,在数据代表和2D愿景的处理方法处理方法的不同。我们经常以3D的变压器评估这些方法的竞争力,我们将这些其业绩与共同的变压方法与非目标进行比较。我们用3D更新了。我们用不同的变压文件来讨论。