降缩的乙型乙型乙型乙型六氯环己烷:克服六氯环十二烷五氯苯甲醚中不良调节的诅咒</s> (Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA)

This paper is concerned with estimating the column subspace of a low-rank matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of $\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called $\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both $\ell_2$ and $\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

翻译：本文关注从污染数据中估算低位矩阵 $\ boldsymbol{ X<unk> star {x_star\ $ in\ mathbb{R<unk> n__1\timen_2}$ 的列子空间。如何获得最佳统计准确性, 同时又能容纳最广泛的信号- 噪音比率( SNRs), 而在超位噪音和不平衡的维度( 即, $_ 2\gg n_ 1美元) 的情况下, 尤其具有挑战性。虽然最先进的算法 $\ textsf{ HeteroPCA} 成为解决这一问题的有力解决方案, 但它却受到“ 错误诅咒” 的困扰。即, 它的性能会随着 $\ boldsymallsballsbol{X\ star$的增长。为了克服这个关键问题, 同时又不损害可允许的SNRIS( 范围), 我们提议一种新型算法, 叫做 $\ textf{ defrifleflead- Heter- HeloaroPA} $- host lical_deal_recklex_ral_ral_ rodeal_ dexal_ exal_ exal_ exalalalalal_ exalbalbal_ exalbisal__ $2, $2;</s>

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日