In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in order to produce a better classifier than with labeled data alone. However, the conditions under which such an improvement is possible are not fully understood yet. Our analysis focuses on improvements in the minimax learning rate in terms of the number of labeled examples (with the number of unlabeled examples being allowed to depend on the number of labeled ones). We argue that for such improvements to be realistic and indisputable, certain specific conditions should be satisfied and previous analyses have failed to meet those conditions. We then demonstrate examples where these conditions can be met, in particular showing rate changes from $1/\sqrt{\ell}$ to $e^{-c\ell}$ and from $1/\sqrt{\ell}$ to $1/\ell$. These results improve our understanding of what is and isn't possible in semi-supervised learning.
翻译:在半监督分类中,人们可以获得标签和未标签数据。由于未标签数据通常比标签数据更便宜,因此一旦人们能够利用未标签数据产生比标签数据更好的分类器,这种设置就变得有利。然而,这种改进在哪些条件下是可能的,目前还不能完全理解。我们的分析侧重于在标签实例数量方面提高微量学习率(允许未标签实例的数量取决于标签数字)。我们争辩说,为了使这些改进现实和无可争议,应当满足某些具体条件,而先前的分析未能满足这些条件。我们然后展示能够满足这些条件的例子,特别是显示了汇率从1美元/ qrt 美元到 $ ⁇ -c\ell}美元以及从1美元/\ qrt 美元到 美元/\ell美元的变化。这些结果提高了我们对半监督学习中哪些是和不可能实现什么的理解。