高度五氯苯甲醚 (Empirical Bayes PCA in high dimensions)

When the dimension of data is comparable to or larger than the number of data samples, Principal Components Analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical "spiked" models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.

翻译：在这项工作中,我们建议采用“经验型贝耶斯”五氯苯甲醚方法,通过估计主要组成部分的先前联合分布来减少这种噪音。EB-PCA基于古典Kiefer-Wolfowitz的非参数MLE,用于经验型贝叶估计,来自抽样PC的随机矩阵理论的分布结果,以及使用“近似消息传递”算法的迭接精炼。在理论性“喷射”模型中,EB-PCA作为了解真实前程的甲骨骼AMP程序,在同一环境中达到巴耶斯-最佳估计精度。在模拟和从1000个基因组项目和国际哈普马普项目中构建的定量基准方面,EPB-PCA在具有很强的先前结构时大大改进了五氯苯。为分析单细胞RNA-Seq获得的基因表达数据提供了插图。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【开放书】《现代统计学导论》，549页pdf

专知会员服务

73+阅读 · 2021年7月11日

【普林斯顿经典书】高维概率，326页pdf，Probability in High Dimension

专知会员服务

107+阅读 · 2021年2月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日