Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.
翻译:神经网图(GNNs)在图表代表性学习和计算机视觉中获得了动力,在诸如数据挖掘(\ emph{ 例如 ) 社会网络分析和建议系统、计算机视觉(\ emph{ 例如 ) 物体探测和点云学习) 和自然语言处理(\ emph{ ( 例如 ) 关系提取和顺序学习) 等各个领域提升了艺术水平。随着自然语言处理和计算机视觉中的变异器的出现,图形变异器在变异器结构中嵌入了一张图表结构,以克服地方邻里集合的局限性,同时避免严格的结构性诱导偏差。在本文件中,我们从任务导向的角度对计算机视野中的GNNS和图形变异器进行了全面审查。具体地说,我们将其在计算机视觉中的应用按照输入数据模式分为五类, \ emph{ ( ) 关系提取和顺序学习) 。 2D 自然图像、视频、 3D 数据、 视觉+ 语言 和医学图像的出现。 在每一类中,我们进一步按照一套远见任务组合划分应用,我们如何按照一套远见任务划分,在各种任务上,这些任务导向定义上,我们如何处理这些任务―― 能够很好地研究。