Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.
翻译:基于图形的半监督的学习方法将图形结构和标签数据结合起来,对未贴标签的数据进行分类。在这项工作中,我们研究了一个吵闹的神器对分类的影响。特别是,当一个吵闹的神器透露了其中一部分标签时,我们得出了将一个学位校正的软体块模型(DC-SBM)组合起来的“最高后选(MAP)”估计值。然后我们提出一种从持续放松《世界地图》中得出的算法,我们建立了一致性。数字实验表明,我们的方法在合成和真实数据集上取得了有希望的性能,即使在非常吵闹的标签数据的情况下也是如此。