We show that the sample complexity of robust interpolation problem could be exponential in the input dimensionality and discover a phase transition phenomenon when the data are in a unit ball. Robust interpolation refers to the problem of interpolating $n$ noisy training data in $\R^d$ by a Lipschitz function. Although this problem has been well understood when the covariates are drawn from an isoperimetry distribution, much remains unknown concerning its performance under generic or even the worst-case distributions. Our results are two-fold: 1) too many data hurt robustness; we provide a tight and universal Lipschitzness lower bound $\Omega(n^{1/d})$ of the interpolating function for arbitrary data distributions. Our result disproves potential existence of an $\mathcal{O}(1)$-Lipschitz function in the overparametrization scenario when $n=\exp(\omega(d))$. 2) Small data hurt robustness: $n=\exp(\Omega(d))$ is necessary for obtaining a good population error under certain distributions by any $\mathcal{O}(1)$-Lipschitz learning algorithm. Perhaps surprisingly, our results shed light on the curse of big data and the blessing of dimensionality for robustness, and discover an intriguing phenomenon of phase transition at $n=\exp(\Theta(d))$.
翻译:我们显示,强力内插问题的抽样复杂性在输入维度中可能是指数化的,当数据在一个单位球中时会发现一个阶段性过渡现象。 强力内插是指以利普西茨函数将美元噪音培训数据以美元内插的问题。 虽然当从同位素分布中得出共差时,这个问题已经非常清楚,但在通用或甚至最坏的分布中,其性能仍有很多未知数。 我们的结果有两重:(1) 数据太多伤害了稳健性;我们提供了一种紧紧和普遍的利普西茨低约束的美元(n ⁇ 1/d}) 。 强力内插函数的内插值是任意数据分布的美元。 我们的结果排除了在超能力分布的情况下存在美元(O}(1) 美元- 利普西茨维特功能的功能。 小额数据伤害了我们在某些分布中获得良好人口错误( 美元/升/ d) 和 螺旋性(美元) 令人惊讶地在质量分析结果中学习一个令人惊讶的快速流化的结果。