在混合模型中将弱学习者转变为强学习者 (Self-training Converts Weak Learners to Strong Learners in Mixture Models)

from arxiv, 23 pages. This version has added more detailed comparisons with related work, fixed a technical issue in the original proof, and improved the convergence guarantee to be about the last iterate of stochastic gradient descent

We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.

翻译：当数据来自两个旋转对称分布的混合物时,我们就会考虑一个二进制分类问题。当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物时, 当数据来自两个旋转对称分配的混合物的混合物的混合物, 当数据来自两个旋转对称分配的混合物的混合物的混合物, 当数据来自对正对称分配的浓度和反集中特性时, 当假标签的标签是 $\ mathrm{er{er\\\\\\ er\ 美元=0, 当一个伪标签的标签能实现分类的分类错误时, 只能通过最多 $\\ dlaliscial=lation$( 美元) 没有标签的自动计算结果, 当我们用一个硬的变的货币变的货币的货币的变数。