We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a $\chi^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.
翻译:我们考虑了对Oja主要成分分析流动算法中主要成分分析主要成分分析中主要成分分析算法错误估算的不确定性进行量化的问题,因为数据是从一些未知分布中生成的ID。我们通过将U-统计学文献的古典工具与高维中位矢量和矩阵产品集聚四维形式的最近结果的中位限制理论理论结合起来,将Oja主要成分分析中主要成分分析中主要成分分析中主要源体的估计错误的不确定性加以量化。由于估算与接近分布相关的共变式矩阵需要了解未知模型参数,我们建议了一种可在线更新的乘数靴捕捉算法。我们建立了各种条件,使靴子分布与相应的采样分布接近,而且概率很高,从而将靴栅确定为在适当的随机制度下的一种一致的推论方法。