We develop in this paper a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, it is shown that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM, by means of which a unified analysis can be conducted for these well-established but not fully-understood regression approaches. Resulting from the new framework, a novel interpretation of existing bounded nonconvex loss functions can be concluded. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, it is shown that the Tukey's biweight loss can be derived from the triweight kernel. Similarly, other frequently employed bounded nonconvex loss functions in machine learning such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning.
翻译:在本文中,我们开发了一个实证最大增益框架(EGM),以解决在反应变量中可能出现重尾噪音或超值异常时的强势回归问题。 EGM的构想是将噪音分布的密度功能近似于噪音分布的密度功能,而不是像通常一样直接接近真理函数。与传统的最大可能性估算不同,这种估算鼓励所有观测同等重要,在异常观察面前可能有问题,EGM办法可以从最低距离估计角度来解释,并允许对这些观察的无知。此外,还表明,一些众所周知的重整型非Convex型强势反转模式的强势非Convelx回归模式可以重新制定到这个新框架。我们为EGM开发了一个学习理论,通过这种方法可以对这些已经确立但并非完全理解的回归方法进行统一分析。根据新框架,对现有的非Convex型损失函数的新解释可以完成。在这个新框架中,两个看起来无关的术语,也就是众所周知的Tukey的双重损失,对于坚固的回归和正重的平方平方平方值函数功能可以进行精确的学习。