In this work we study statistical properties of graph-based algorithms for multi-manifold clustering (MMC). In MMC the goal is to retrieve the multi-manifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\mathcal{M} = \mathcal{M}_1 \cup\dots \cup \mathcal{M}_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\mathcal{M}$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.
翻译:在这项工作中,我们研究了基于图形的多功能组群算法(MMC)的统计特性。在MMC中,我们的目标是在假设通过对一个元元组合的分布进行抽样抽样来采集一个基于图形的多功能结构结构,该数字组是特定欧洲大陆数据集的基础。确切地说,我们提供了高概率误差,用于在$\mathcal{M}= = = mathcal{M ⁇ 1\ cup\dots\ cup\ cup\ cal{M ⁇ N$,该数字组可能相互交叉,而且可能具有不同层面。我们调查了数据集中相似的图表必须满足的足够条件,以便其相应的Laplicacian图表能够捕捉到正确的几何信息,以解决MMC问题。我们用一个基于美元和数学模型的模型模型来测量一个光谱点的光谱点。我们用这个模型来测量我们家庭直径直径的直径直径图,我们用这些直径直径直径直径的模型来测量。