The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample $x$ whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when $x$ falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.
翻译:内插和外推概念在从深层次学习到功能近似等多个领域都具有根本意义。当样本在特定数据集的圆柱体内或边界上出现时,即对样本的内插值为x美元;当美元在锥体外出现时,即对外推值为x美元。一个基本(错误)的观念是,最先进的算法之所以有效,是因为它们有能力正确将培训数据内插。第二个(错误)概念是,在整个任务和数据集中都出现内插,事实上,许多直觉和理论都以这一假设为依据。我们从经验上和理论上都反对这两个点,并表明在任何高维($$100)数据集上,内插值几乎永远不会发生。这些结果质疑我们目前作为通用性表现指标的内插/外推定义的有效性。