用于不对称数据集间内在距离的日志- 欧clidean 签名 (Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets)

The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically compare such operators in a pointwise manner or assume known data alignment. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.

翻译：有效比较和代表与未知对齐的数据集的必要性涉及多个领域,从模型分析和机器学习比较到医学数据集收集过程中的趋势发现,从模型分析和比较到趋势发现,我们利用多方面的学习来比较不同数据集的内在几何结构,方法是比较其扩散操作器、与离散样本中连续的Laplace-Beltrami操作器的近似值有关的对正对正定义矩阵。现有方法通常以点比较方式比较这些操作器,或假设已知的数据对齐。相反,我们利用SPD矩阵的里曼几何法来比较这些操作器,并根据日志-Euclidean测量仪的较低约束,界定新的具有理论动机的距离。我们的框架有助于比较数据集中表达的数据元与不同大小、特征数量和测量模式的对比。我们的日志-Euclidean信号(LES)远程恢复了有意义的结构差异,超越了不同应用领域的竞争方法。

相关内容

数据集

关注 0

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日