The minimum regularized covariance determinant method (MRCD) is a robust estimator for multivariate location and scatter, which detects outliers by fitting a robust covariance matrix to the data. Its regularization ensures that the covariance matrix is well-conditioned in any dimension. The MRCD assumes that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, the computation time of MRCD increases substantially when the number of variables goes up, and nowadays datasets with many variables are common. The proposed Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator addresses both issues. It is not restricted to elliptical data because it implicitly computes the MRCD estimates in a kernel induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations. Based on the KMRCD estimates, a rule is proposed to flag outliers. The KMRCD algorithm performs well in simulations, and is illustrated on real-life data.
翻译:最小常态共变决定因素法( MRCD) 是多变量位置和散射的稳健估计器, 它通过对数据安装一个稳健的共变矩阵来检测异常值。 它的正规化可以确保共变矩阵在任何层面都有良好的条件。 MRCD 假设非外向观测大致分布, 但许多数据集不是这种形式。 此外, 当变量数量上升时, MRCD 的计算时间会大大增加, 而现在的数据集与许多变量是常见的。 提议的 Kernel 最低常态共变决定因素( KMRCD) 估计器( KMRCD) 解决了这两个问题。 它不局限于电子数据, 因为它暗含在内核诱导的特性空间中计算 MRCD 估计数。 快速算法是从内核初步估计开始的, 并利用内核魔来加速随后的计算。 根据 KMRCD 估计, 向外端标出了一条规则。 KMRCD 算法在模拟中很好地进行计算, 并用真实生命数据来说明 。