Semi-supervised learning and weakly supervised learning are important paradigms that aim to reduce the growing demand for labeled data in current machine learning applications. In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized labels on the unlabeled data. We provide an error bound that exploits both the local geometric properties of the underlying graph and the quality of the prior information. We also propose a framework to incorporate multiple sources of noisy information. In particular, we consider the setting of weak supervision, where our sources of information are weak labelers. We demonstrate the ability of our approach on multiple benchmark weakly supervised classification tasks, showing improvements upon existing semi-supervised and weakly supervised methods.
翻译:半监督的学习和低监督的学习是重要范例,旨在减少当前机器学习应用程序中对标签数据日益增长的需求。在本文中,我们对古典标签传播算法(LPA)(Zhu & Ghahramani,2002年)进行了新颖的分析,此外,我们利用了有用的先前信息,特别是未标签数据上的概率性假设标签。我们提供了一个错误界限,利用了底图的本地几何特性和先前信息的质量。我们还提出了一个框架,以纳入多种噪音信息来源。特别是,我们考虑了监管薄弱的设置,我们的信息来源是薄弱的标签师。我们展示了我们处理多基准、低监督的分类任务的方法的能力,展示了现有半监督和薄弱监督方法的改进。</s>