Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer (MGT), for learning representations of sketches from multiple graphs which simultaneously capture global and local geometric stroke structures, as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: (i) achieves small recognition gap to the CNN-based performance upper bound (72.80% vs. 74.22%), and (ii) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer.
翻译:鉴于信号的广度和素描的高层次抽象性,学习自由手动素描图有意义的实际表现仍是一项艰巨的任务。现有技术侧重于利用与进化神经网络(CNNs)的素描静态性质,或与经常性神经网络(RNNS)的时序属性。在这项工作中,我们建议将素描新表述为多条少相联的图示。我们设计了一个新颖的图像神经网络(GNNN),多格夫变换器(MGT),用于学习同时捕捉全球和地方几何中风结构的多张图形的素描图,以及时间信息。我们报告对一个素描识别任务进行广泛的数字实验,以展示拟议方法的性能。特别是,MGTT对Google QuickDraw(Raw)的414k素描图应用了MGTS:(i)实现基于CNNP的性能上限(72.80%对74.22%),以及(ii)所有基于RNNE的模型的外形。我们最了解的是,这是在GNPHS/BASTexexims上首次提议将素描写模型用作GNS/Meximeximals的图和GODrodustrimals。在GMS/BS/Beximals的图上应用。