There has been an emerging trend in non-Euclidean statistical analysis of aiming to recover a low dimensional structure, namely a manifold, underlying the high dimensional data. Recovering the manifold requires the noise to be of certain concentration. Existing methods address this problem by constructing an approximated manifold based on the tangent space estimation at each sample point. Although theoretical convergence for these methods is guaranteed, either the samples are noiseless or the noise is bounded. However, if the noise is unbounded, which is a common scenario, the tangent space estimation at the noisy samples will be blurred. Fitting a manifold from the blurred tangent space might increase the inaccuracy. In this paper, we introduce a new manifold-fitting method, by which the output manifold is constructed by directly estimating the tangent spaces at the projected points on the underlying manifold, rather than at the sample points, to decrease the error caused by the noise. Assuming the noise is unbounded, our new method provides theoretical convergence in high probability, in terms of the upper bound of the distance between the estimated and underlying manifold. The smoothness of the estimated manifold is also evaluated by bounding the supremum of twice difference above. Numerical simulations are provided to validate our theoretical findings and demonstrate the advantages of our method over other relevant manifold fitting methods. Finally, our method is applied to real data examples.
翻译:在非欧洲化的统计分析中出现了一种新趋势,即旨在恢复低维结构的非欧洲化的统计分析,即高维数据背后的多元,即高维数据背后的多元数据。恢复多元数据需要一定的集中性。恢复多元数据要求有一定的噪音。现有方法解决这一问题,根据对每个取样点的相近空间估计,构建一个大致的多元数据。虽然这些方法的理论趋同得到保证,但样品要么没有噪音,要么噪音被隔绝。但是,如果噪音没有进入,则杂音样品上的细微空间估计将会变得模糊不清。从模糊的正切空间中装配一个多元数据可能会增加不准确性。在本文件中,我们采用了一种新的多元数据配制方法,通过直接估计在基本构造点而不是抽样点上的相近空间来构建一个大致的多元数据,以减少噪音造成的误差。假设噪音不受限制,我们的新方法在估计的距离和根本的距离的高度界限方面提供了理论趋同性一致的理论一致。我们所估计的数字的平滑度,最后通过模拟方法向我们展示了我们的其他方法的模型的优势。