Semi-supervised learning and weakly supervised learning are important paradigms that aim to reduce the growing demand for labeled data in current machine learning applications. In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized labels on the unlabeled data. We provide an error bound that exploits both the local geometric properties of the underlying graph and the quality of the prior information. We also propose a framework to incorporate multiple sources of noisy information. In particular, we consider the setting of weak supervision, where our sources of information are weak labelers. We demonstrate the ability of our approach on multiple benchmark weakly supervised classification tasks, showing improvements upon existing semi-supervised and weakly supervised methods.
翻译:半监督学习和弱监督学习是当前机器学习应用中,旨在减少标签数据需求的重要范式。本文介绍了一种经典标签传播算法(LPA)(Zhu&Ghahramani,2002)的新分析,进一步利用未标记数据上的概率假设标签的有用先验信息。我们提供了一个错误界限,利用底层图的局部几何特性和先验信息的质量。我们还提出了一个框架,用于合并多个嘈杂信息源。特别是,我们考虑到弱监督设置,即我们信息源为弱标签器。我们在多个基准弱监督分类任务中展示了我们方法的能力,显示出超过现有半监督和弱监督方法的改进。