There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data for a classifier consists of a limited number of classified observations but a much larger number of unclassified observations. This is because the procurement of classified data can be quite costly due to high acquisition costs and subsequent financial, time, and ethical issues that can arise in attempts to provide the true class labels for the unclassified data that have been acquired. We provide here a review of statistical SSL approaches to this problem, focussing on the recent result that a classifier formed from a partially classified sample can actually have smaller expected error rate than that if the sample were completely classified.
翻译:人们越来越注意半监督的学习方法,即当分类者的培训数据包括有限的机密观察数据,但非机密观察数据的数量却大得多时,在机器学习中采用半监督的学习方法来形成分类者,这是因为由于获取费用高以及随后的财务、时间和道德问题,在试图为获得的非机密数据提供真实的分类标签时,分类数据的采购费用可能相当高。我们在此审查统计的SSL方法,着重最近的结果,即从部分分类抽样中产生的分类者实际上可能比在完全分类时的预期误差率要低。