The task of node classification is to infer unknown node labels, given the labels for some of the nodes along with the network structure and other node attributes. Typically, approaches for this task assume homophily, whereby neighboring nodes have similar attributes and a node's label can be predicted from the labels of its neighbors or other proximate (i.e., nearby) nodes in the network. However, such an assumption may not always hold -- in fact, there are cases where labels are better predicted from the individual attributes of each node rather than the labels of its proximate nodes. Ideally, node classification methods should flexibly adapt to a range of settings wherein unknown labels are predicted either from labels of proximate nodes, or individual node attributes, or partly both. In this paper, we propose a principled approach, JANE, based on a generative probabilistic model that jointly weighs the role of attributes and node proximity via embeddings in predicting labels. Our experiments on a variety of network datasets demonstrate that JANE exhibits the desired combination of versatility and competitive performance compared to standard baselines.
翻译:节点分类的任务是根据一些节点的标签以及网络结构和其他节点属性来推断未知节点的标签。 通常, 此项任务的方法是同质的, 邻居节点的标签具有相似的属性, 节点的标签可以从其邻居的标签或网络中其他相近( 即附近) 节点的节点中预测出来。 但是, 这种假设可能并不总是能够维持 -- -- 事实上, 有一些情况下, 标签从每个节点的个别属性而不是其近节点的标签中预测得更好。 理想的情况是, 节点分类方法应该灵活地适应一系列设置, 其中未知的标签要么从临近节点的标签或单个节点属性的标签中预测出来, 要么部分从两者中预测出来。 在本文中,我们建议一种原则性的方法, JANE, 依据一种基因化的预测性能模型, 共同衡量属性的作用, 和通过嵌嵌入标签而避免偏近。 我们在各种网络数据集上的实验表明, JANE 展示了理想的多才性和竞争性性与标准基线的结合。