We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. Specifically, given labeled examples $(x, y)$ from a distribution $D$ on $\mathbb{R}^{n} \times \{ \pm 1\}$ such that the marginal distribution on $x$ is arbitrary and the labels are generated by an unknown halfspace corrupted with Massart noise at rate $\eta<1/2$, we want to compute a hypothesis with small misclassification error. Characterizing the efficient learnability of halfspaces in the Massart model has remained a longstanding open problem in learning theory. Recent work gave a polynomial-time learning algorithm for this problem with error $\eta+\epsilon$. This error upper bound can be far from the information-theoretically optimal bound of $\mathrm{OPT}+\epsilon$. More recent work showed that {\em exact learning}, i.e., achieving error $\mathrm{OPT}+\epsilon$, is hard in the Statistical Query (SQ) model. In this work, we show that there is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm. In particular, our lower bound implies that no efficient SQ algorithm can approximate the optimal error within any polynomial factor.
翻译:我们研究PAC学习半空在Massart (Massart) (Massart) 噪音(Massart) 中学习半空的复杂程度。 具体地说, 在Massart (Massart) 噪音存在的情况下, 我们研究 PAC 学习半空的复杂复杂性。 具体地说, 在Massart (Massart) 模型中, 有效学习半空的特性在学习理论中一直是一个长期的开放问题。 最近的工作给出了一个以$\mathb{R ⁇ n}\\\ pm1 {pm 1\\\ ⁇ ⁇ 美元发行的标签, 美元边际分配的边际分配是任意的, 标签是由一个未知的半空格以$\mathrm{OPT ⁇ epsilon$的未知的未知半空格。 我们最近的工作显示, 最精确的学习是, 例如, $\mathrem; 在统计Q- hal- develrial 错误中, 最精确的模型可以显示, 最精确的Squal road 。