This article proposes a new class of Real Elliptically Skewed (RESK) distributions and associated clustering algorithms that allow for integrating robustness and skewness into a single unified cluster analysis framework. Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications. Robustness is essential because a few outlying observations can severely obscure the cluster structure. The RESK distributions are a generalization of the Real Elliptically Symmetric (RES) distributions. To estimate the cluster parameters and memberships, we derive an expectation maximization (EM) algorithm for arbitrary RESK distributions. Special attention is given to a new robust skew-Huber M-estimator, which is also the maximum likelihood estimator (MLE) for the skew-Huber distribution that belongs to the RESK class. Numerical experiments on simulated and real-world data confirm the usefulness of the proposed methods for skewed and heavy-tailed data sets.
翻译:文章提出一个新的“ 真实斜斜( RESK) ” 分布和相关的群集算法, 以便能够将稳健性和扭曲性纳入单一的统一群集分析框架。 在各种真实世界应用中, 报告了非对称分布和重尾数据组。 强性至关重要, 因为一些外围观测可以严重模糊群集结构。 RESK 分布法是“ 真实斜对称( RES) 分布法” 的概括性。 为了估计群集参数和成员, 我们为任意的 RESK 分布得出了预期最大化算法。 特别注意一个新的强大的 skew- Huber M- 测量器, 这也是属于 RESK 类的 偏斜- Huber 分布的最大可能性。 模拟和真实世界数据的量化实验证实了拟议偏斜和重尾细数据集方法的有用性 。