Bayesian 有条件的、自动递减的LASSO模型,以学习与预测器一起的粗微微生物网络 (Bayesian Conditional Auto-Regressive LASSO Models to Learn Sparse Microbial Networks with Predictors)

Microbiome data analyses require statistical models that can simultaneously decode microbes' reactions to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straightforward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes because it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenter's intuition on the average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO and is also guaranteed to be the sparse solution to a Conditional Auto-Regressive (CAR) model. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting, and compositional responses via appropriate hierarchical structure. We apply our model to a human gut and a soil microbial compositional datasets and we highlight that CAR-LASSO can estimate biologically meaningful network structures in the data. The CAR-LASSO software is available as an R package at https://github.com/YunyiShen/CAR-LASSO.

翻译：微生物数据分析需要能够同时解码微生物对环境的反应和微生物之间相互作用的统计模型。虽然多反应线性回归模型似乎是一个简单的解决方案, 我们争辩说, 将它作为图形模型处理是有缺陷的, 因为回归系数矩阵没有将响应和预测节点之间的有条件依赖结构编码, 因为它不代表相邻矩阵。当我们事先了解特定实验干预的边缘, 而这些实验干预只能在有条件依赖模式下正确编码。在此, 我们提议一个具有两套节点( 预测和回应) 的链式图表模型, 其解决方案产生一张带有确实代表有条件依赖性的边缘的图表, 因此, 我们提出, 实验者对正在处理的节点的平均行为直观。我们的模型的解决方案通过巴耶西亚LASO, 并且还被保证成为解析自动递增模式( CAR) 模型的解决方案。此外, 我们提议了一个适应性扩展的扩展, 不同的缩略图可以应用到不同边缘的边端( 预测和回应) 将精细的 RASL 和直径系统结构应用到我们的生物级结构中。我们的模型和将一个廉价的土壤序列结构进行计算, 我们的模型可以适当地计算, 通过一个小路路运数据分析, 数据可以用来用来进行。