Here we show an application of our recently proposed information-geometric approach to compositional data analysis (CoDA). This application regards relative count data, which are, e.g., obtained from sequencing experiments. First we review in some detail a variety of necessary concepts ranging from basic count distributions and their information-geometric description over the link between Bayesian statistics and shrinkage to the use of power transformations in CoDA. We then show that powering, i.e., the equivalent to scalar multiplication on the simplex, can be understood as a shrinkage problem on the tangent space of the simplex. In information-geometric terms, traditional shrinkage corresponds to an optimization along a mixture (or m-) geodesic, while powering (or, as we call it, exponential shrinkage) can be optimized along an exponential (or e-) geodesic. While the m-geodesic corresponds to the posterior mean of the multinomial counts using a conjugate prior, the e-geodesic corresponds to an alternative parametrization of the posterior where prior and data contributions are weighted by geometric rather than arithmetic means. To optimize the exponential shrinkage parameter, we use mean-squared error as a cost function on the tangent space. This is just the expected squared Aitchison distance from the true parameter. We derive an analytic solution for its minimum based on the delta method and test it via simulations. We also discuss exponential shrinkage as an alternative to zero imputation for dimension reduction and data normalization.
翻译:这里我们展示了我们最近提出的用于组成数据分析(CoDA)的信息地理测量方法的应用。 这个应用涉及相对计数数据, 例如从测序实验中获得的相对计数数据。 首先,我们比较详细地审查一系列必要的概念, 从基本计数分布及其关于Bayesian统计和缩缩水与CoDA中功率转换联系的信息地理测量描述。 我们然后显示, 动力, 即相当于简单x 上的算法乘法倍增, 可以被理解为简单x正对等空间的缩缩缩缩问题。 在信息地理测量术语中, 传统缩放相当于混合( 或 m-) 的优化, 而动力( 或我们称之为指数缩水) 与 CoDA 相连接。 虽然 m- 地理计算与多数值的后向值相匹配, 之前的算法与之前的直径比平比 。 以正向模型的平偏移法, 其预测算法的精确度比前的直径直径精确度 。