Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift. This paper makes some initial steps toward analyzing the extrapolation of nonlinear models for structured domain shift. We primarily consider settings where the marginal distribution of each coordinate of the data (or subset of coordinates) does not shift significantly across the training and test distributions, but the joint distribution may have a much bigger shift. We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an arbitrary function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. To the best of our knowledge, this is the first result that goes beyond linear models and the bounded density ratio assumption, even though the assumptions on the distribution shift and function class are stylized.
翻译:现实世界机器学习应用程序通常涉及将神经网络部署到培训时间未见的领域。 因此,我们需要理解非线性模型的外推法 -- -- 在分布和功能类的条件下,模型可以保证外推到新的测试分布。 这个问题非常具有挑战性, 因为即使是双层神经网络也不能保证在支持培训分配之外外外外推,而不进一步假设域变。 本文为分析非线性模型的外推法为结构化域变换提供了一些初步步骤。 我们主要考虑的是每个数据协调点( 或坐标子集)的边际分布不会在培训和测试分布之间发生重大变化,但联合分布可能有很大的转变。 我们证明,表格$f(x)sum f_i(x_i)$(x_i)$($_ i) 的组合是任意功能。 本文为分析非线性模型对结构化域变换的外推法, 如果特征的相差非常精确, 我们主要考虑的是每个数据( 或坐标子组) 的边际分布不会在培训和测试分布之间发生重大变化, 联合分布可能发生更大的变化。 我们证明表型号的模型的精确度的模型的模型是分置。