Semi-supervised learning provides an effective paradigm for leveraging unlabeled data to improve a model\s performance. Among the many strategies proposed, graph-based methods have shown excellent properties, in particular since they allow to solve directly the transductive tasks according to Vapnik\s principle and they can be extended efficiently for inductive tasks. In this paper, we propose a novel approach for the transductive semi-supervised learning, using a complete bipartite edge-weighted graph. The proposed approach uses the regularized optimal transport between empirical measures defined on labelled and unlabelled data points in order to obtain an affinity matrix from the optimal transport plan. This matrix is further used to propagate labels through the vertices of the graph in an incremental process ensuring the certainty of the predictions by incorporating a certainty score based on Shannon\s entropy. We also analyze the convergence of our approach and we derive an efficient way to extend it for out-of-sample data. Experimental analysis was used to compare the proposed approach with other label propagation algorithms on 12 benchmark datasets, for which we surpass state-of-the-art results. We release our code.
翻译:半监督学习为利用未贴标签的数据来改进模型绩效提供了一个有效的范例。 在许多拟议战略中,基于图形的方法显示出了极好的特性,特别是因为它们能够按照Vapnik\s原则直接解决传输任务,并且可以有效地扩大这些任务的范围,以便进行感测任务。 在本文件中,我们提出了一种新型的转导半监督学习方法,使用完整的双边边边边加权图。 提议的方法使用在标签和未贴标签数据点上界定的经验性措施之间的定期最佳运输方式,以便从最佳运输计划中获取一种贴标签和未贴标签的数据矩阵。这个矩阵进一步用于通过图表的顶端传播标签,在一个渐进过程中确保预测的确定性分数的确定性分数。 我们还分析了我们的方法的趋同性,并提出了一种高效率的方法来扩展它用于抽样数据。 实验性分析用于将提议的标签传播方法与其他基准数据集进行比较,我们为此超越了我们的状态代码。