Recently there has been increased interest in semi-supervised classification in the presence of graphical information. A new class of learning models has emerged that relies, at its most basic level, on classifying the data after first applying a graph convolution. To understand the merits of this approach, we study the classification of a mixture of Gaussians, where the data corresponds to the node attributes of a stochastic block model. We show that graph convolution extends the regime in which the data is linearly separable by a factor of roughly $1/\sqrt{D}$, where $D$ is the expected degree of a node, as compared to the mixture model data on its own. Furthermore, we find that the linear classifier obtained by minimizing the cross-entropy loss after the graph convolution generalizes to out-of-distribution data where the unseen data can have different intra- and inter-class edge probabilities from the training data.
翻译:最近,在有图形信息的情况下,人们对半监督分类的兴趣有所增加。出现了一种新的学习模式,在最基本的层次上,这种模式依赖于在先使用图形变形数据后对数据进行分类。为了了解这一方法的优点,我们研究了高斯人混合物的分类,数据与随机区块模型的节点属性相对应。我们显示,图变将数据线性分离的体系扩展至大约1美元/斯克特{D}的系数,在这种体系中,美元是预期的节点程度,与混合模型本身的数据相比。此外,我们发现,通过尽量减少图变图变一般后的跨作物损失而获得的线性分类,将不可见数据与培训数据相比具有不同的内部和阶级间边缘概率的数据扩大到分配范围外的数据。