基于特征变换的高多维数据强力五氯苯甲醚 (Robust PCA for High Dimensional Data based on Characteristic Transformation)

In this paper, we propose a novel robust Principal Component Analysis (PCA) for high-dimensional data in the presence of various heterogeneities, especially the heavy-tailedness and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. Besides the typical outliers, the proposed method has the unique advantage of dealing with heavy-tail-distributed data, whose covariances could be nonexistent (positively infinite, for instance). The proposed approach is also a case of kernel principal component analysis (KPCA) method and adopts the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties including the upper bound of the excess error and the behaviors of the large eigenvalues under a spiked covariance model. In addition, we show the advantages of our method over the classical PCA by a variety of simulations. At last, we apply the new robust PCA to classify mice with different genotypes in a biological study based on their protein expression data and find that our method is more accurately on identifying abnormal mice comparing to the classical PCA.

翻译：在本文中,我们提出了一种新颖的强健的主元件分析(PCA),用于在存在各种差异的情况下提供高维数据,特别是重尾和外部离子。由特性函数驱动的转变是为了提高古典五氯苯甲醚的稳健性而设计的。除了典型的外端外端外,拟议方法具有处理重尾分配数据的独特优势,其共差可能不存在(例如,积极的无限性)。提议的方法也是内核主要元件分析(KPCA)方法的一个实例,它通过一种封闭和非线性内核函数采用强健和非线性特性。一些统计属性说明了新方法的优点,包括超误的上层和在一种螺旋共差模型下的巨大叶素值的行为。此外,我们还通过多种模拟,展示了我们的方法优于经典五氯苯甲醚的优势。最后,我们运用新的强健健的五氯苯甲醚,在生物研究中,根据蛋白表现数据对不同基因型小鼠进行分类,并发现我们的方法更准确地识别了古典的甲醚。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日