We propose an estimation procedure for covariation in wide compositional data sets. For compositions, widely-used logratio variables are interdependent due to a common reference. Logratio uncorrelated compositions are linearly independent before the unit-sum constraint is imposed. We show how they are used to construct bespoke shrinkage targets for logratio covariance matrices and test a simple procedure for partial correlation estimates on both a simulated and a single-cell gene expression data set. For the underlying counts, different zero imputations are evaluated. The partial correlation induced by the closure is derived analytically. Regularized partial correlations are implemented in the propr R package.
翻译:我们建议了广泛组成数据集的共变估计程序。对于组成,由于共同参照,广泛使用的logratio变量因共同参照而相互依存。Loguratio 与cors相关的组成在单位和总和限制施加之前线性独立。我们展示了如何使用它们来构建logratio 共变矩阵的单线缩放目标,并测试模拟和单细胞基因表达数据集部分相关估计的简单程序。对于基本计算,则评估不同的零估算值。关闭引起的部分关联是用分析方法得出的。在Proper R 包中实施了常规部分关联。