Many datasets are collected automatically, and are thus easily contaminated by outliers. In order to overcome this issue there was recently a regain of interest in robust estimation. However, most robust estimation methods are designed for specific models. In regression, methods have been notably developed for estimating the regression coefficients in generalized linear models, while some other approaches have been proposed e.g.\ for robust inference in beta regression or in sample selection models. In this paper, we propose Maximum Mean Discrepancy optimization as a universal framework for robust regression. We prove non-asymptotic error bounds, showing that our estimators are robust to Huber-type contamination. We also provide a (stochastic) gradient algorithm for computing these estimators, whose implementation requires only to be able to sample from the model and to compute the gradient of its log-likelihood function. We finally illustrate the proposed approach by a set of simulations.
翻译:许多数据集是自动收集的,因此很容易被外部线条污染。为了克服这个问题,最近人们重新对稳健估算感兴趣。然而,大多数稳健估算方法是针对具体模型设计的。在回归中,为估计通用线性模型的回归系数而特别制定了方法,同时提出了其他一些方法,例如:\,用于在贝塔回归或样本选择模型中进行稳健推断。在本文中,我们提议将最大平均值差异优化作为稳健回归的普遍框架。我们证明,我们没有被动误差界限,表明我们的估测器对Huber型污染是强健的。我们还为计算这些估计器提供了一种(随机的)梯度算法,其实施只需要能够从模型中取样,并计算其日志相似功能的梯度。我们最后通过一系列模拟来说明拟议的方法。