组成数据分析的信息-地质学观点 (The Information-Geometric Perspective of Compositional Data Analysis)

Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally described as a Euclidean space with an orthonormal basis featuring components that are suitable combinations of the original parts. In contrast to the Euclidean metric, the information-geometric description singles out the Fisher information metric as the only one keeping the manifold's geometric structure invariant under equivalent representations of the underlying random variables. Well-known concepts that are valid in Euclidean coordinates, e.g., the Pythogorean theorem, are generalized by information geometry to corresponding notions that hold for more general coordinates. In briefly reviewing Euclidean CoDA and, in more detail, the information-geometric approach, we show how the latter justifies the use of distance measures and divergences that so far have received little attention in CoDA as they do not fit the Euclidean geometry favored by current thinking. We also show how entropy and relative entropy can describe amalgamations in a simple way, while Aitchison distance requires the use of geometric means to obtain more succinct relationships. We proceed to prove the information monotonicity property for Aitchison distance. We close with some thoughts about new directions in CoDA where the rich structure that is provided by information geometry could be exploited.

翻译：信息几何学使用不同几何的正式工具,将概率分布的空间描述为具有额外双重结构的里曼尼方形体。组成数据与离散概率分布的正等等同形式使得能够对组成数据分析(CoDA)的样本空间适用相同的描述。后者被正式描述为具有正正态基础的欧几里德空间,其组成部分与原始部分相适宜组合。与Euclidean 度量度相比,信息几何描述将富饶信息度量度从Fisher信息度量度中单列出来,认为只有这一度量度能保持该元的几异结构在基本随机变量的相对方向下保持等同的偏差结构。在Euclidean 数据分析(Codhodorean) 坐标坐标(Coorem) 中,众所周知的概念是相同的,通过信息几里德里亚德(Ethogorea) 等同概念,我们通过直径测量方法来解释远程测量远程测量。我们在CoDA 度结构中也很少注意。我们用直方数据,我们用直方数据来说明如何在直方关系中比较接近地平方关系,我们用直方数据结构来解释。我们用。我们用直方数据结构来解释。我们用直方数据,我们用直方关系来解释。我们用直方关系来说明如何在Codalmarimarimarimaisal 。