缺少数据的六氯代二苯甲醚的推断 (Inference for Heteroskedastic PCA with Missing Data)

This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a suite of solutions to perform valid inference on the principal subspace based on two estimators: a vanilla SVD-based approach, and a more refined iterative scheme called $\textsf{HeteroPCA}$ (Zhang et al., 2018). We develop non-asymptotic distributional guarantees for both estimators, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Particularly worth highlighting is the inference procedure built on top of $\textsf{HeteroPCA}$, which is not only valid but also statistically efficient for broader scenarios (e.g., it covers a wider range of missing rates and signal-to-noise ratios). Our solutions are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels and noise distributions.

翻译：本文研究如何在高维方面为主要组成部分分析(PCA)构建信任区,这个问题一直没有得到充分探讨。虽然计算非线性/非非convex估计器的不确定性的测量方法在高维方面总体上困难重重,但缺乏数据和心电图噪音的普遍存在使挑战更加复杂。我们提出了一系列解决方案,以基于两个估计器(香草SVD基方法和更精细的迭代方案)对主要次空间进行有效推断:香草SVD基方案,以及称为$\textsf{HeteroPCA}$(Zhang等人,2018年)的更精细的迭代方案。我们为非线性/非线性 convex估计器的测量器制定非线性分布保证,并展示如何利用这些数据对主要亚空基空间和热量组合的切入度间隔进行兼容。我们特别值得强调的是,在$\ textsf{HeteroPCA}顶端建立的推论程序不仅有效,而且对更广泛的设想方案也具有统计效率(例如,我们事先需要更广义的、更广义的存储率数据比例,而需要我们更广义的测量的噪音水平的测量数据流数据流数据流到比。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

因果推断，Causal Inference：The Mixtape

专知会员服务

107+阅读 · 2021年8月27日

干货！南京大学吴建鑫教授《模式识别》2021课程，附课件下载

专知会员服务

74+阅读 · 2021年4月14日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日