We study efficient PAC learning of homogeneous halfspaces in $\mathbb{R}^d$ in the presence of malicious noise of Valiant (1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al. (2017) and show that it essentially achieves the near-optimal sample complexity bound of $\tilde{O}(d)$, improving the best known result of $\tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al. (2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty et al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.
翻译:在Valiant (1985年) 恶意噪音的情况下,我们研究PAC 有效地学习了美元(mathbb{R ⁇ d$) 的同质半径。这是一个具有挑战性的噪音模型,直到最近才在无标签数据分布为北冰洋正对冷凝层的温和条件下建立了接近最佳的噪音耐受度。然而,它仍然未解决如何同时获得最佳样本复杂性的问题。在这项工作中,我们提出了对Awasthi等人的算法(2017年)进行的新分析,并表明它基本上达到了美元(d)的近最佳样本复杂性,改善了$\tilde{O}(d ⁇ 2) 的已知最佳结果。我们的主要成份是新颖地整合了Chernoff型不平等,将经验性常态分布矩阵的频谱捆绑在一起,同时仔细探索Awasthi等人的本地化计划(2017年) 。我们进一步将算法和分析扩大到Bwashothi 等人的更普遍、更强的噪音模型,从而有可能实现Bshoutimal Exmal Exmissual 和等的复杂度。(2002年) 。