The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to $0.5$, and a nearly-minimax-rate-breakdown point approximately equal to $0.28$. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.
翻译:本文的目的是要显示, 多变量 Gaussia 分布平均值的单一稳健估计符可以享有五个理想属性。 首先, 它可以计算为可移动性, 因为它可以计算在一个时间里, 在尺寸、 样本大小和反污染率的对数上, 最多多为多数值。 第二, 它通过翻译、 统一缩放和正方形变换等异性。 第三, 它拥有一个高差点, 等于 0. 5 美元, 并且接近最低比率断裂点, 大约等于 2. 28 美元。 第四, 它是最小比率最佳的, 最高为对数系数, 当数据包含被敌对选择的外端破坏的独立观测数据时。 第五, 当污染率趋向为零时, 它会具有混杂性。 每个采样点的重量都通过解调整问题来反复更新。 此外, 我们还将一个无维度的最小值非数值的污染率优化点, 直至一个对数的对数率, 直至一个对数率的对数系数进行调节。 最后, 它的排序结果会显示为我们所预期的文数的细度, 的排序的结果是错。