We give a computationally-efficient PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noise (Massart and N\'ed\'elec, 2006) and Tsybakov noise (Tsybakov, 2004). Specialized to the $\eta$-Massart noise setting, our algorithm achieves an information-theoretically near-optimal label complexity of $\tilde{O}\left( \frac{d}{(1-2\eta)^2} \mathrm{polylog}(\frac1\epsilon) \right)$ under a wide range of unlabeled data distributions (specifically, the family of "structured distributions" defined in Diakonikolas et al. (2020)). Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our efficient algorithm provides label complexity guarantees strictly lower than passive learning algorithms.
翻译:我们给出了一个计算高效的PAC主动学习算法,用于能容忍Massart噪音(Massart和N\'ed\'elec,2006年)和Tsybakov噪音(Tsybakov,2004年)的元美元-马萨特噪音设置,我们的算法在极具挑战性的Tsybakov噪音条件下,实现了“$tilde{O ⁇ left”(\\frac{d{d{(1-2\beta)}}\mathrm{polylog}(\frac1\epsilon)\right)的信息-理论上接近最佳标签的复杂性。 在这种条件下,我们的高效算法提供标签复杂性保证的严格低于被动学习算法。