Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in literature. As for variable selection, many methods for sparse model selection have been proposed, including the Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection and argue why even cell-wise robust methods cannot fix this problem. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the lowest in-sample losses so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. An extensive simulation study with non-robust regression and classification algorithms as well as with Sparse Least Trimmed Squares reveals both the potential of our approach to boost the model selection robustness as well as the fragility of variable selection using non-robust algorithms, even for an extremely small cell-wise contamination rate.
翻译:除非估算程序适当稳健,否则污点会严重扭曲估计值。这是一个众所周知的问题,在强力统计中已经解决了这个问题。然而,污染和扭曲的变量选择之间的关系在文献中很少考虑。关于变量选择,提出了许多稀释模式选择方法,包括基于某些变量选择算法的稳定选择方法,这是一个元性分类法,以针对特定数据配置进行免疫。我们引入变量选择解析点,以量化重复的病例数量。为了不被检测相关变量,必须污染的单元格数量。我们表明,特定的外部配置可以完全误导模式选择,并争论为什么即使是细胞稳健的方法也无法解决这个问题。我们将变量选择分解点与重新标集结合起来,从而形成稳定选择选择的稳健性。我们提出一个三曲式的稳定选择选择点,它只将模型与最小的模拟损失合并起来,因此,对于严重污染的单元格重塑的模型,甚至应该加以修正。一个广泛的模拟性研究,利用不稳健的模型,将我们最难的变式选择算法作为最难的变式的变式分析,作为最难的变式的变式的变式分析,作为最难的变式的变式的变式的变式,作为最低的变式的变式的变式的变式的变式的变式,作为最低的变式的变式的变式的变式的变式的变式的变式的变式的变式,作为最低的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式,作为最低的变式的变式的变制和变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式,作为最低的变式的变式的变的变式的变式的变式的变式,作为最低的变制和变式的变式的变式的变式的变式的变式的变式的变式的变式的变制和变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变制和变式的变式的变式的变式的变式的变式的变式的变式的变式的