Cellwise outliers are widespread in data and traditional robust methods may fail when applied to datasets under such contamination. We propose a variable selection procedure, that uses a pairwise robust estimator to obtain an initial empirical covariance matrix among the response and potentially many predictors. Then we replace the primary design matrix and the response vector with their robust counterparts based on the estimated covariance matrix. Finally, we adopt the adaptive Lasso to obtain variable selection results. The proposed approach is robust to cellwise outliers in regular and high dimensional settings and empirical results show good performance in comparison with recently proposed alternative robust approaches, particularly in the challenging setting when contamination rates are high but the magnitude of outliers is moderate. Real data applications demonstrate the practical utility of the proposed method.
翻译:细胞外向值在数据中广泛存在,传统的稳健方法在应用到这种污染下的数据集时可能失败。我们提议了一个可变选择程序,使用一个双向稳健的估算器,在反应和可能的许多预测器之间获得初步的经验性共变矩阵。然后,我们用基于估计共变矩阵的强健对应方取代初级设计矩阵和反应矢量。最后,我们采用适应性拉索,以获得可变选择结果。拟议方法对于常规和高维环境中的单向外向值是稳健的,而实证结果显示与最近提议的替代稳健方法相比表现良好,特别是在具有挑战性的设定污染率高但外向值中等的情况下。实际数据应用显示了拟议方法的实际效用。