In this paper we propose an adaptive approach for clustering and visualization of data by an orthogonalization process. Starting with the data points being represented by a Markov process using the diffusion map framework, the method adaptively increase the orthogonality of the clusters by applying a feedback mechanism inspired by the Gromov-Wasserstein distance. This mechanism iteratively increases the spectral gap and refines the orthogonality of the data to achieve a clustering with high specificity. By using the diffusion map framework and representing the relation between data points using transition probabilities, the method is robust with respect to both the underlying distance, noise in the data and random initialization. We prove that the method converges globally to a unique fixpoint for certain parameter values. We also propose a related approach where the transition probabilities in the Markov process are required to be doubly stochastic, in which case the method generates a minimizer to a nonconvex optimization problem. We apply the method on cryo-electron microscopy image data from biopharmaceutical manufacturing where we can confirm biologically relevant insights related to therapeutic efficacy. We consider an example with morphological variations of gene packaging and confirm that the method produces biologically meaningful clustering results consistent with human expert classification.
翻译:在本文中,我们建议采用一个适应性的方法,通过一个正正向化过程对数据进行分组和直观化。从使用扩散地图框架的Markov进程代表的数据点开始,该方法通过应用Gromov-Wasserstein距离的反馈机制,适应性地提高数据组的正向性。这一机制迭代地增加光谱差距,并改进数据的正向性,以便实现高度特殊性的组合。通过使用扩散地图框架并代表使用过渡概率的数据点之间的关系,该方法在基本距离、数据噪音和随机初始化两个数据点之间都具有很强性。我们证明该方法在全球范围内与某些参数值的独特固定点相融合。我们还提出了一个相关的方法,其中要求Markov进程的过渡概率具有双重的相近性,从而使得该方法产生一个最小到非convex优化问题的最小值。我们运用了从生物制药制造业获得的隐性电子微镜像图像数据的方法,在那里我们可以确认与生物相关的洞测数据,从而确认与持续的人类基因分类方法的遗传性变化。我们用一个实例来确认与持续的基因分析方法。