Semi-supervised learning is highly useful in common scenarios where labeled data is scarce but unlabeled data is abundant. The graph (or nonlocal) Laplacian is a fundamental smoothing operator for solving various learning tasks. For unsupervised clustering, a spectral embedding is often used, based on graph-Laplacian eigenvectors. For semi-supervised problems, the common approach is to solve a constrained optimization problem, regularized by a Dirichlet energy, based on the graph-Laplacian. However, as supervision decreases, Dirichlet optimization becomes suboptimal. We therefore would like to obtain a smooth transition between unsupervised clustering and low-supervised graph-based classification. In this paper, we propose a new type of graph-Laplacian which is adapted for Semi-Supervised Learning (SSL) problems. It is based on both density and contrastive measures and allows the encoding of the labeled data directly in the operator. Thus, we can perform successfully semi-supervised learning using spectral clustering. The benefits of our approach are illustrated for several SSL problems.
翻译:半监督学习在标注数据稀缺但未标注数据丰富的常见情况下非常有用。图拉普拉斯矩阵是解决各种学习任务的基本平滑算子。对于无监督聚类,通常使用基于图拉普拉斯矩阵特征向量的谱嵌入。对于半监督问题,常见方法是解决一个受图拉普拉斯矩阵约束的优化问题,其被狄利克雷能量正则化。然而,随着监督减少,狄利克雷优化变得次优。因此,我们希望在无监督聚类和低监督图基分类之间获得平稳过渡。在本文中,我们提出了一种适用于半监督学习(SSL)问题的新型图拉普拉斯矩阵。它基于密度和对比度度量,并允许将标记数据直接编码到运算符中。因此,我们可以成功地使用谱聚类进行半监督学习。我们的方法的优点在几个SSL问题中得到了证明。