Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure.
翻译:考虑一组独立取样的点数, 靠近欧几里得空间的平滑紧凑的下层。 我们提供数学严格界限, 以高度自信估算该元体的尺寸和相切空间所需的样本点数。 估算的算法是本地的CPA, 主要元件分析的本地版本。 我们的计算结果适应于杂音的非统一数据分布, 其噪音可能会在多个元件之间变化, 并允许在多个点同时进行估算。 关键是, 我们约束中出现的所有常数都得到了明确的描述。 证据使用矩阵浓度不平等来估计共变量矩阵和瓦塞斯坦距离, 以量化概率计量的不线性基本多元和不一致性。