The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically compare such operators in a pointwise manner or assume known data alignment. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.
翻译:有效比较和代表与未知对齐的数据集的必要性涉及多个领域,从模型分析和机器学习比较到医学数据集收集过程中的趋势发现,从模型分析和比较到趋势发现,我们利用多方面的学习来比较不同数据集的内在几何结构,方法是比较其扩散操作器、与离散样本中连续的Laplace-Beltrami操作器的近似值有关的对正对正定义矩阵。现有方法通常以点比较方式比较这些操作器,或假设已知的数据对齐。相反,我们利用SPD矩阵的里曼几何法来比较这些操作器,并根据日志-Euclidean测量仪的较低约束,界定新的具有理论动机的距离。我们的框架有助于比较数据集中表达的数据元与不同大小、特征数量和测量模式的对比。我们的日志-Euclidean信号(LES)远程恢复了有意义的结构差异,超越了不同应用领域的竞争方法。