Semi-supervised learning provides an effective paradigm for leveraging unlabeled data to improve a model's performance. Among the many strategies proposed, graph-based methods have shown excellent properties, in particular since they allow to solve directly the transductive tasks according to Vapnik's principle and they can be extended efficiently for inductive tasks. In this paper, we propose a novel approach for the transductive semi-supervised learning, using a complete bipartite edge-weighted graph. The proposed approach uses the regularized optimal transport between empirical measures defined on labelled and unlabelled data points in order to obtain an affinity matrix from the optimal transport plan. This matrix is further used to propagate labels through the vertices of the graph in an incremental process ensuring the certainty of the predictions by incorporating a certainty score based on Shannon's entropy. We also analyze the convergence of our approach and we derive an efficient way to extend it for out-of-sample data. Experimental analysis was used to compare the proposed approach with other label propagation algorithms on 12 benchmark datasets, for which we surpass state-of-the-art results. We release our code.
翻译:半监督学习为利用未贴标签的数据来改进模型性能提供了一个有效的范例。在许多拟议战略中,基于图表的方法显示了极好的特性,特别是因为这些方法能够按照Vapnik的原则直接解决传输任务,并且可以有效地扩大这些任务的范围以完成感化任务。在本文件中,我们提出了一种新型的转导半监督学习方法,使用完整的双边边边加权图表。提议的方法使用在贴标签和未贴标签数据点上界定的经验性措施之间的定期最佳传输,以便从最佳运输计划中获取一种亲和矩阵。这个矩阵还被用来通过图表的顶部通过渐进过程传播标签,确保预测的确定性,方法是根据香农的诱导值纳入确定性分数。我们还分析了我们方法的趋同性,并提出了一种高效的方法来扩展它用于抽样数据。在12个基准数据集上将拟议的标签传播算法与其他标签作比较,我们为此超越了我们的状态值。