Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts; and find that the choice of embedding biases the performance of different GNN architectures and thus the choice of embedding influences the selection of GNNs regardless of the underlying dataset. In addition, we only see an improvement in accuracy from some GNN models compared to the accuracy of models trained from scratch or fine-tuned on the underlying data without utilising the graph connections. As an alternative, we propose Graph-connected Network (GraNet) layers to better leverage existing unconnected models within a GNN. Existing language and vision models are thus improved by allowing neighbourhood aggregation. This gives a chance for the model to use pre-trained weights, if possible, and we demonstrate that this approach improves the accuracy compared to traditional GNNs: on Flickr v2, GraNet beats GAT2 and GraphSAGE by 7.7% and 1.7% respectively.
翻译:目前图表教学技术使用图形神经网络(GNN)来从数据集嵌入中提取特征。 在这项工作中,我们检查这些嵌入的质量,评估这些嵌入的质量,并评估这些嵌入如何影响GNN的准确性。我们探索图像和文本的不同嵌入提取技术;我们发现,选择嵌入偏向不同GNN结构的性能,从而选择嵌入影响GNN的选择,而不论其基本数据集如何。此外,我们只看到一些GNN模型的准确性与从零开始或对基本数据进行微调而无需使用图形连接的模型的准确性相比有所提高。作为一种替代办法,我们提出了图形连接网络(GRANet)层,以便更好地利用GNN内现有的非连接模型。因此,通过允许社区汇总,现有语言和愿景模型得到改进。这样,模型有可能使用预先培训的重量。我们证明,与传统的GNNN相比,这种方法的准确性有所提高:在Flickr v2上, GraNet将GAT2比GARAT2和GragraphSAGageSageSage,分别比7.7%和1.7%。