It is well-known that trimmed sample means are robust against heavy tails and data contamination. This paper analyzes the performance of trimmed means and related methods in two novel contexts. The first one consists of estimating expectations of functions in a given family, with uniform error bounds; this is closely related to the problem of estimating the mean of a random vector under a general norm. The second problem considered is that of regression with quadratic loss. In both cases, trimmed-mean-based estimators are the first to obtain optimal dependence on the (adversarial) contamination level. Moreover, they also match or improve upon the state of the art in terms of heavy tails. Experiments with synthetic data show that a natural ``trimmed mean linear regression'' method often performs better than both ordinary least squares and alternative methods based on median-of-means.
翻译:众所周知,剪切的样本手段对重尾巴和数据污染具有很强的抗力。本文分析了两种新情况中剪裁的方法和相关方法的性能。第一是估计特定家庭功能的预期值,有统一的误差界限;这与根据一般规范估计随机矢量的平均值的问题密切相关。第二个问题被考虑为以四面体损失为代价的回归问题。在这两种情况下,以裁剪为主的估测器是首先获得对(对抗性)污染水平的最佳依赖的。此外,它们还匹配或改进了重尾品的先进水平。合成数据的实验表明,自然“断切线性线性回归方法”往往比普通的最小方形和基于中位值的替代方法效果更好。