We develop a computationally-efficient PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noise~\citep{massart2006risk} and Tsybakov noise~\citep{tsybakov2004optimal}. Specialized to the $\eta$-Massart noise setting, our algorithm achieves an information-theoretic optimal label complexity of $\tilde{O}\left( \frac{d}{(1-2\eta)^2} \mathrm{polylog}(\frac1\epsilon) \right)$ under a wide range of unlabeled data distributions (specifically, the family of "structured distributions" defined in~\citet{diakonikolas2020polynomial}). Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our algorithm achieves computational efficiency and provide label complexity guarantees strictly lower than passive learning algorithms.
翻译:我们开发了一个计算高效的 PAC 活动学习算法, 用于 $d$- 维度同质半空格, 能够容忍 Massart 噪音 ⁇ citep{ massart2006risk} 和 Tsybakov 噪音 ⁇ citep{tsybakov2004opmatimal} 。 专门用于 $eta$- Massart 噪音设置, 我们的算法实现了 $\ tilde{O ⁇ left (\\\ frac{d}( 1-2\beta)\2}\ mathrm{polylog} (\ frac1\ epsilon)\ right), 在一系列无标签数据分布( 具体来说, 在 ⁇ citet{ diakonikolas20polyynomal} 定义的“ 结构分布组 ” 。 在更具挑战性的 Tsybakov 噪音条件下, 我们确定了两种噪音状况的亚群, 。 在这两种情况下, 我们的算法实现了计算效率, 并且提供标签复杂性保证比被动学习算算算算算算算算算算算算算法更低。