Here we show an application of our recently proposed information-geometric approach to compositional data analysis (CoDA). This application regards relative count data, which are, e.g., obtained from sequencing experiments. First we review in some detail a variety of necessary concepts ranging from basic count distributions and their information-geometric description over the link between Bayesian statistics and shrinkage to the use of power transformations in CoDA. We then show that powering, i.e., the equivalent to scalar multiplication on the simplex, can be understood as a shrinkage problem on the tangent space of the simplex. In information-geometric terms, traditional shrinkage corresponds to an optimization along a mixture (or m-) geodesic, while powering (or 'exponential' shrinkage) can be optimized along an exponential (or e-) geodesic. While the m-geodesic corresponds to the posterior mean of the multinomial counts using a conjugate prior, the e-geodesic corresponds to an alternative parametrization of the posterior where prior and data contributions are weighted by geometric rather than arithmetic means. To optimize the exponential shrinkage parameter, we use mean-squared error as a cost function on the tangent space. This is just the expected squared Aitchison distance from the true parameter. We derive an analytic solution for its minimum based on the delta method and test it via simulations. We also discuss exponential shrinkage as an alternative to zero imputation for dimension reduction and data normalization.
翻译:这里我们展示了我们最近提出的用于组成数据分析的信息地理分析方法(CoDA) 的应用。 这个应用涉及相对计数数据, 例如从测序实验中获得的相对计数数据。 首先, 我们比较详细地审查一系列必要的概念, 从基本计数分布及其关于Bayesian统计和缩水之间的信息地理测量描述到CoDA中电力转换的连接。 我们然后显示, 电动, 即相当于简单x 上的等量乘数乘法, 可以被理解为简单x 的正对流空间的缩缩缩缩问题。 在信息地理测量术语中, 传统收缩相当于混合( 或 m-) 的优化, 而动力( 或“ 扩展” 缩水量描述) 可以通过指数化( e- e-) 来优化。 虽然 m- 地理学与多数值的后缀值值对应, 电子地变法与替代的偏移法相匹配。 之前的测算法中, 的测算法中, 的测算法中, 的测算法中, 的测算法中, 的测算法是正数法的比我们测算法的测算法的测算法, 的测算法的测算法, 测算法的测算法的测算法, 。