Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology as well as the ambient persistent homology of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
翻译:熔化凝聚是一个动态过程, 产生一系列多尺度的数据趋同, 旨在编码有意义的抽象学。 事实证明, 它对于高维数据的多重学习、 拆解、 集群和可视化十分有效 。 熔化凝聚是一个时间异的过程, 因为关于数据传播的大部分文献都集中在同质过程上。 从表面学角度, 我们从几何、 光谱和地形学角度对这一过程的趋同和演进进行理论化分析。 我们从几何角度从几何、 光谱和地貌学角度对这一过程的趋同和演变进行分析。 我们从几何角度从最小的转换概率和数据半径的角度获得趋同的界限, 而从光谱角度看, 我们的界限则是基于扩散数据的方法, 并且从分布到数据最细的分解, 我们用最深层次的数据分析 来解释数据流化过程 。 我们用最深层次的数据分析, 我们用最深层次的数据分析 来解释数据对上层数据的解析过程的解析过程 。 我们用最深层次数据分析, 我们用最深层次数据分析 来分析 。 我们用最深层次的解的解的解过程 。 我们用最深层次数据本身的解过程 。 我们用最深的解 分析, 我们用最深的解的解的解的解的解析过程 。