LASSO 利用预测器学习粗微微生物网络模型 (Bayesian Chain Graph LASSO Models to Learn Sparse Microbial Networks with Predictors)

Microbiome data require statistical models that can simultaneously decode microbes' reaction to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straight-forward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes as it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenter's intuition on the average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting and compositional responses via appropriate hierarchical structure. We apply our model to a human gut and a soil microbial compositional datasets and we highlight that CG-LASSO can estimate biologically meaningful network structures in the data. The CG-LASSO software is available as an R package at https://github.com/YunyiShen/CAR-LASSO.

翻译：微生物数据需要能够同时解码微生物对环境的反应和微生物之间相互作用的统计模型。虽然多反应线性回归模型似乎是一种直向式解决方案, 我们争辩说, 将它作为图形模型处理是有缺陷的, 因为回归系数矩阵没有将响应和预测节点之间的有条件依赖结构编码, 因为它不代表相邻矩阵。在生物环境中, 当我们事先掌握特定实验干预措施的边缘知识, 而这些干预措施只能在有条件依赖模型下正确编码。在这里, 我们提议了一个带两套节点( 预测和反应) 的链式图表模型, 其解决方案产生一张具有确实代表有条件依赖性的边缘的图表, 因此, 我们同意实验者对正在处理的节点的平均行为的直觉。我们模型的解决方案在Bayesian LASSO 中是稀疏的。此外, 我们提出一个适应性的扩展, 以便不同的缩略图可以应用到不同的边缘, 以纳入有条件依赖模型。我们的模型通过高效的 GPS 取样算价廉的模型, 并且可以用来算出一个Binal- Glas- L 数据结构, 我们的 CASSL 和C- cal- cal- sal- salial 数据结构中我们可以用的C- sal- sal- sal- sal- sal- sal- lavial- sal- sal- supal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- commessal- commessal- lavial ladal ladal ladal ladal ladal compal ladal ladal- weal- weal- weal- wedal ladal ladal- weal- weal- weal- weal- wedal- weal- wedal- weal- weal- weal ladal ladal ladal- weal- weal- weal-sal- weal- weal- sal- weal- weal- weal- weal- weal- sal-sal- weal- weal- sal- weal- las- weal- weal-