Self-training (ST) is a straightforward and standard approach in semi-supervised learning, successfully applied to many machine learning problems. The performance of ST strongly depends on the supervised learning method used in the refinement step and the nature of the given data; hence, a general performance guarantee from a concise theory may become loose in a concrete setup. However, the theoretical methods that sharply predict how the performance of ST depends on various details for each learning scenario are limited. This study develops a novel theoretical framework for sharply characterizing the generalization abilities of the models trained by ST using the non-rigorous replica method of statistical physics. We consider the ST of the linear model that minimizes the ridge-regularized cross-entropy loss when the data are generated from a two-component Gaussian mixture. Consequently, we show that the generalization performance of ST in each iteration is sharply characterized by a small finite number of variables, which satisfy a set of deterministic self-consistent equations. By numerically solving these self-consistent equations, we find that ST's generalization performance approaches to the supervised learning method with a very simple regularization schedule when the label bias is small and a moderately large number of iterations are used.
翻译:自我培训(ST)是半监督学习的一个简单和标准的方法,它成功地应用于许多机器学习问题。ST的表现在很大程度上取决于在改进步骤中采用的有监督的学习方法和给定数据的性质;因此,一个简明理论的一般绩效保障可能在具体设置中变得松散。然而,急剧预测ST的表现如何取决于每个学习方案的各种细节的理论方法是有限的。本项研究开发了一个新的理论框架,以鲜明地说明ST所培训模型的概括能力,这些模型使用非严格复制的统计物理法,我们发现,当数据来自两个组成部分高斯混合时,线性模型的ST最大限度地减少了脊脊柱化交叉渗透性损失。因此,我们表明,每次循环中ST的普及性表现明显地以少量有限的变量为特征,这些变量满足了一套确定性自我兼容的方程式。我们从数字上解析这些自相容方程式,我们发现ST对受监督的学习方法的小型一般化性能方法是使用的一种非常简单的正规化的标签。