相似度比较的几何 (Geometry of Similarity Comparisons)

Many data analysis problems can be cast as distance geometry problems in \emph{space forms} -- Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only \emph{comparisons} of similarities amongst a set of entities, what can be said about the geometry of the underlying space form? To study this question, we introduce the notions of the \textit{ordinal capacity} of a target space form and \emph{ordinal spread} of the similarity measurements. The latter is an indicator of complex patterns in the measurements, while the former quantifies the capacity of a space form to accommodate a set of measurements with a specific ordinal spread profile. We prove that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. This leads to a lower bound on the Euclidean and spherical embedding dimension of what we term similarity graphs. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form. We support our theoretical claims with experiments on weighted trees, single-cell RNA expression data and spherical cartographic measurements.

翻译：许多数据分析问题可以被描绘为 emph{ space forms} -- Euclidean、球状或双曲线空间中的距离几何问题。通常, 绝对距离测量往往不可靠或根本无法使用, 只能以相似的形式提供绝对距离的替代物。因此, 我们问道 : 鉴于一组实体之间相似之处只有 emph{ comtraxsons}, 有关基础空间形式几何的描述是什么? 为了研究这一问题, 我们引入了目标空间形式和类似测量的 emblidiet{ ordinal 能力] 的概念。后者是测量中复杂模式的指标, 而前者只是以相似的形式提供。因此我们问道: : 由于一组实体之间只有 emph{ compressions 的相似性, 一个空间形式与其尺寸和曲解的标志有关。为了研究这一问题, 我们引入了 Euclidean 和 splovical 嵌入度度度度的度度度概念, 我们称之为类似图表的的的的和直观的的直观, 我们用的的的直观的的的的的, 我们用直观的的的的的直观直观直观直观的的直方直方的的的的的直方的直方直方直方的的的的, 我们可以表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示更表示表示表示表示表示表示表示, 我们表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示表示, 我们表示表示表示表示表示表示表示表示表示表示表示表示的表示表示表示表示的的的