We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a weighted $\chi^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.
翻译:我们考虑了对Oja主要成分分析流动算法中主要成分分析主要成分分析中数据产生的IID数据的估计误差的不确定性进行量化的问题。通过将U-统计学文献的古典工具与高维中位矢量和矩阵产品集成的高维中位限制最近结果的理论结合,我们为Oja主要成分分析中主要成分分析中主要成分分析中主要源代码的误差确定了一个加权的2美元近似值结果。由于估算与接近分布相关的共变矩阵需要了解未知的模型参数,我们建议了一种可在线更新的倍增靴式测算法。我们建立了一种条件,使靴式分布与相应采样分布接近,概率很高,从而将靴式测出器确定为在适当的随机制度下的一种一致的推论方法。