The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this paper, we introduce a new framework for model selection in MCD with spectral embedding based on the notion of stability. Our best subset algorithm leverages principal component analysis for dimension reduction, statistical depths for effective initialization, and concentration steps for subset refinement. Subsequently, we construct a bootstrap procedure to estimate the instability of the best subset algorithm. The parameter combination exhibiting minimal instability proves ideal for the purposes of high-dimensional outlier detection, while the instability path offers insights into the inlier/outlier structure. We rigorously benchmark the proposed framework against existing MCD variants and illustrate its practical utility on two spectra data sets and a cancer genomics data set.
翻译:暂无翻译