We study efficient PAC learning of homogeneous halfspaces in $\mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi~et~al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $\tilde{O}(d)$, improving the best known result of $\tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi~et~al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al.~(2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.
翻译:在Valiant~(1985) 的恶意噪音面前,我们研究PAC 高效的PAC 学习 $mathbb{R ⁇ d$ 的同质半径 。 这是一个具有挑战性的噪音模型,直到最近才有近最佳的噪音耐受度被确定在轻微条件下,即未贴标签的数据分配是异向正对正对正对方对面的。 然而,它仍然未解决如何同时获得最佳的样本复杂性。 在这项工作中,我们对Awasthi~et~al.~(2017)的算法进行了新的分析,并表明它基本上达到了$\tilde{O}(d)$的近最佳样本组合,改善了$\tilde{O}(d ⁇ 2)$的已知最佳效果。 我们的主要成份是将一个矩阵 Chernoff- 类型不平等的新整合, 以约束良好传播的经验性可变式矩阵的频谱。 我们在仔细探索Awasti~et~al.~ (2017) ~(2017) 的本地化计划时,我们进一步将算算和分析扩展到接近于一个复杂度的模型。