In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art performance. However, pseudo-labels typically stem from ad-hoc heuristics, relying on the quality of the predictions though without guaranteeing their validity. One such method, so-called credal self-supervised learning, maintains pseudo-supervision in the form of sets of (instead of single) probability distributions over labels, thereby allowing for a flexible yet uncertainty-aware labeling. Again, however, there is no justification beyond empirical effectiveness. To address this deficiency, we make use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions. As a result, the construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data. Along with this, we present effective algorithms for learning from credal self-supervision. An empirical study demonstrates excellent calibration properties of the pseudo-supervision, as well as the competitiveness of our method on several benchmark datasets.
翻译:在半监督的学习中,自我培训的范式指的是学习者自己建议的假标签的学习理念。在各个领域,相应的方法证明是有效的,并且达到了最先进的性能。然而,假标签通常来自临时的杂乱性,依靠预测的质量,但又不保证其有效性。这样一种方法,即所谓的封闭式自我监督学习,以标签的几套概率分布(而不是单一的)形式维持假监督,从而允许灵活但不确定的标签。然而,同样,除了经验性能之外,没有任何正当理由。为了解决这一缺陷,我们使用符合性的预测,这是保证定值预测的有效性的一种方法。因此,在构建几套校准标签时得到严格的理论基础的支持,导致对未贴标签的数据进行更好的校准和不易出错的监督。此外,我们提出了有效的算法,以便从严谨的自我监督的自我监督观点中学习。一项实验性的研究表明,我们的一些基本数据测试方法是极好的。