Semi-supervised learning has received attention from researchers, as it allows one to exploit the structure of unlabeled data to achieve competitive classification results with much fewer labels than supervised approaches. The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers. Notably, its solution can be written as a linear combination of the known labels. The coefficients of this linear combination depend on a parameter $\alpha$, determining the decay of the reward over time when reaching labeled vertices in a random walk. In this work, we discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error. Moreover, we propose to minimize this leave-one-out loss with automatic differentiation. Within this framework, we propose methods to estimate label reliability and diffusion rate. Optimizing the diffusion rate is more efficiently accomplished with a spectral representation. Results show that the label reliability approach competes with robust L1-norm methods and that removing diagonal entries reduces the risk of overfitting and leads to suitable criteria for parameter selection.
翻译:半监督的学习受到研究人员的注意,因为它使得人们能够利用未贴标签的数据结构,以比监督方法少得多的标签来取得竞争性分类结果。本地和全球一致性算法是最著名的基于图形的半监督分类法之一。值得注意的是,其解决办法可以写成已知标签的线性组合。这种线性组合的系数取决于一个参数 $\ alpha$,从而确定在随机步行到达贴标签的脊椎时,奖励在时间上的衰落。在这项工作中,我们讨论了取消贴标签实例的自我影响可能有什么好处,以及它与放出单错误有何关系。此外,我们提议以自动区分的方式尽量减少这种离场一次损失。在此框架内,我们提出估算标签可靠性和传播率的方法。以光谱表示方式优化推广率是更有效率的。结果显示,标签可靠性方法与稳健的L1-平面方法竞争,并且删除分层条目会降低过度调整和导致适当参数选择标准的风险。