We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as $\delta$. Here, $\gamma$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $exp(-\Omega(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $\beta > 0$ learning halfspaces up to error $OPT_{LTF} + \epsilon$ takes time at least $d^{\tilde{\Omega}(1/\epsilon^{2-\beta})}$ under the same hardness assumptions. Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $OPT_{{PTF}_\ell} + \epsilon$ takes time at least $d^{\tilde{\Omega}(\ell^{2-\beta}/\epsilon^{2-\beta})}$. $OPT_{LTF}$ and $OPT_{{PTF}_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively. Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions. Previously, such hardness results [Daniely16, DKPZ21] were based on average-case complexity assumptions or restricted to the statistical query model. Our work gives the first hardness results basing these fundamental learning problems on worst-case complexity assumptions. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.
翻译:我们以最糟糕的阵列问题, 如GapSVP或SIVP, 是硬硬的假设。 特别是, 在这个假设下, 没有有效的算法, 输出任何二进制假设, 不一定是一个半进制, 实现错判错误比 $1 -\ gamma$好, 即使最优的阵列错误小于 $delta} 。 在这里, $\ gamma$ 可能小于 任何多进制问题。 最坏的阵列问题, 比如 GapSVP 或 SIVP。 最坏的算法, $0 (c\\ log1) 是任意的常数, 美元是这个模式的维度。 在特定分配的设定中, 我们的边际分布是标准高, 在任何 $=Beta 上, 0. 美元学习半进制 美元 。