We study efficient PAC learning of homogeneous halfspaces in $\mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $\tilde{O}(d)$, improving the best known result of $\tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a Matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.
翻译:我们研究PAC 在Valiant~(1985) 的恶意噪音面前, 有效PAC 学习 $\ mathbb{R ⁇ d$ 的同质半径。 这是一个具有挑战性的噪音模型, 直到最近, 近乎最佳的噪音容忍度才在无标记的数据分配为等温和的对数调的温和条件下建立。 然而, 如何同时获得最佳的样本复杂性, 仍然未解决。 在这项工作中, 我们对Awasthi et al.~( 2017) 的算法进行了新的分析, 并表明它基本上达到了 $\ tillde{O}(d)$的近最佳样本复杂性, 改进了$\ tillde{O}(d ⁇ 2)$的已知最佳效果。 我们的主要成份是将Chernoff 型不平等的新整合, 将良好分配的经验性共变式矩阵的频谱捆绑起来。 我们仔细探索了Awasti et al. (2017) 的本地化计划。 我们进一步将算和分析扩展到接近最优化的样本组合, 显示Bshotironial 的复杂度模型。