Clustering categorical distributions in the finite-dimensional probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, the differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the dualistic information-geometric structure induced by a smooth dissimilarity measure, the Kullback-Leibler divergence. In this work, we introduce for clustering tasks a novel computationally-friendly framework for modeling geometrically the probability simplex: The {\em Hilbert simplex geometry}. In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries. We show that both the Aitchison and Hilbert simplex distances are norm distances on normalized logarithmic representations with respect to the $\ell_2$ and variation norms, respectively. We discuss the pros and cons of those different statistical modelings, and benchmark experimentally these different kind of geometries for center-based $k$-means and $k$-center clustering. Furthermore, since a canonical Hilbert distance can be defined on any bounded convex subset of the Euclidean space, we also consider Hilbert's geometry of the elliptope of correlation matrices and study its clustering performances compared to Fr\"obenius and log-det divergences.
翻译:限制维度概率简单x 的绝对分布是许多应用中处理正常直方图应用中的一项基本任务。 传统上, 概率简单度的差数几何结构被使用, 其方法有:(一) 将里曼尼的度量强设置为绝对分布的渔业信息矩阵, 或者(二) 定义由平滑的差异度量导致的双重信息- 测量结构, Kullback- Leiber 差异。 在这项工作中, 我们为分组任务引入一个新颖的、 方便计算的框架, 用于模拟概率简单x的几何等模型: 希伯特简单度的基数直径矩阵。 在 Hilbert 简单度测量中, 距离是不可分离的希伯特的度度量度矩阵矩阵矩阵, 满足信息单度特性的属性, 由多功能边界描述的距离值设定。 我们显示, Aitchison 和 Hilbert 简单度距离是正常对数表达法的距离, 相对于 $\ $_ 2 和 变法规范。 我们讨论这些不同统计核心模型的准值的基数和基数中值的直径基值的直径值, 基准和基的基数的基数的基数, 。