Microbiome data analyses require statistical models that can simultaneously decode microbes' reaction to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straight-forward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes because it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenter's intuition on average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO and is also guaranteed to be the sparse solution to a Conditional Auto-Regressive (CAR) model. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting and compositional responses via appropriate hierarchical structure. Finally, we apply our model to a human gut and a soil microbial composition datasets.
翻译:微生物数据分析需要能够同时解码微生物对环境的反应和微生物之间相互作用的统计模型。 虽然多反应线性回归模型似乎是一种直向的解决方案, 我们争辩说, 将它作为图形模型处理是有缺陷的, 因为回归系数矩阵没有将反应和预测节点之间的有条件依赖结构编码, 因为它不代表相邻矩阵。 当我们事先了解特定实验性干预的边缘, 而这些干预只能在有条件依赖模式下适当编码时, 这一观察在生物环境中尤其重要。 此外, 我们提出一个适应性扩展, 使不同的缩影模型边缘( 预测和反应), 其解决方案产生一个带有确实代表有条件依赖性的边缘的图表, 因此, 我们同意实验者对正在处理的节点的平均行为直觉。 我们模型的解决方案在Bayesian LASSO 中很稀少, 并且还被保证成为条件性自动递增( CAR) 模型的稀少解决方案。 此外, 我们提议了一个适应性扩展, 以便不同的缩图模型可以应用到不同的边端( 预测和反应) 将精度前等级结构纳入我们以前的精度结构结构结构。 我们最后的模型可以应用一个精确的模型, 我们的土壤分析, 通过一个精确的模型可以应用一个精确的模型, 以适当的计算。