There exist multiple methods to detect outliers in multivariate data in the literature, but most of them require to estimate the covariance matrix. The higher the dimension, the more complex the estimation of the matrix becoming impossible in high dimensions. In order to avoid estimating this matrix, we propose a novel random projections-based procedure to detect outliers in Gaussian multivariate data. It consists in projecting the data in several one-dimensional subspaces where an appropriate univariate outlier detection method, similar to Tukey's method but with a threshold depending on the initial dimension and the sample size, is applied. The required number of projections is determined using sequential analysis. Simulated and real datasets illustrate the performance of the proposed method.
翻译:文献中存在多种方法来检测多变量数据中的差值,但大多数方法都需要估算共变量矩阵。越高,越复杂的矩阵估计在高维方面变得不可能。为避免估算这一矩阵,我们建议采用新的随机预测程序,以检测高西亚多变量数据中的差值。它包括在几个单维子空间中预测数据,在其中采用与Tukey方法相似、但取决于初始维度和样本大小的阈值的适当单维值外差检测方法。所需的预测数量通过顺序分析确定。模拟和真实数据集说明了拟议方法的性能。