There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data consists of some feature vectors that have their class labels missing. In this study, we consider the generative model approach proposed by Ahfock&McLachlan(2020) who introduced a framework with a missingness mechanism for the missing labels of the unclassified features. In the case of two multivariate normal classes with a common covariance matrix, they showed that the error rate of the estimated Bayes' rule formed by this SSL approach can actually have lower error rate than the one that could be formed from a completely classified sample. In this study we consider this rather surprising result in cases where there may be more than two normal classes with not necessarily common covariance matrices.
翻译:在培训数据包含某些特征矢量,但没有分类标签的情况下,人们日益注意半监督学习方法(SSL)在机器学习中形成分类者。在本研究中,我们考虑了Ahfock & McLachlan(2020年)提出的基因模型方法,Ahfock & McLachlan(202020年)为未分类特征的缺失标签引入了一个缺少缺失机制的框架。在两个多变正常等级,具有共同的共变矩阵的情况下,它们表明由这种SSL方法形成的估计的Bayes规则的误差率实际上可能低于完全分类样本中可能形成的误差率。在本研究中,我们认为,在可能有两个以上正常等级,不一定具有共同的共变矩阵的情况下,这一结果令人吃惊。