Training labels for graph embedding algorithms could be costly to obtain in many practical scenarios. Active learning (AL) algorithms are very helpful to obtain the most useful labels for training while keeping the total number of label queries under a certain budget. The existing Active Graph Embedding framework proposes to use centrality score, density score, and entropy score to evaluate the value of unlabeled nodes, and it has been shown to be capable of bringing some improvement to the node classification tasks of Graph Convolutional Networks. However, when evaluating the importance of unlabeled nodes, it fails to consider the influence of existing labeled nodes on the value of unlabeled nodes. In other words, given the same unlabeled node, the computed informative score is always the same and is agnostic to the labeled node set. With the aim to address this limitation, in this work, we introduce 3 dissimilarity-based information scores for active learning: feature dissimilarity score (FDS), structure dissimilarity score (SDS), and embedding dissimilarity score (EDS). We find out that those three scores are able to take the influence of the labeled set on the value of unlabeled candidates into consideration, boosting our AL performance. According to experiments, our newly proposed scores boost the classification accuracy by 2.1% on average and are capable of generalizing to different Graph Neural Network architectures.
翻译:用于图形嵌入算法的培训标签在许多实际情景中可能成本很高。 积极的学习( AL) 算法非常有助于获得最有用的培训标签, 同时在一定预算下保留标签查询的总数。 现有的主动图形嵌入框架建议使用中标评分、 密度评分、 和英字评分来评价未贴标签的节点值, 并且已经证明它能够对图集网络节点的节点分类任务带来一些改进。 但是, 在评价未标结结节点的重要性时, 它没有考虑到现有的标签节点对未标结结点值的影响。 换句话说, 在同一未标结结结点的情况下, 计算的信息评分总是相同的, 对标注的节点设置是不可接受的。 为了解决这一限制, 我们在此工作中引入了3个不同的信息评分, 用于积极学习: 特征差异评分( FDS)、 结构不相异评分( SDS) 和嵌入不相像评分分值( EDS) 。 我们发现, 三个评分的计算结果能提升了我们平均评分的评分结果, 我们的评分的评分可以提升了总体评分结构的评分,, 。