Inferring a graphical structure with nodes for multiple responses and predictors is a fundamental statistical problem with broad applications from microbiome, ecology to genetics. While a multiresponse linear regression model seems like a straight-forward solution, we argue that treating it as a graphical model is flawed and caution should be taken because the regression coefficient matrix does not represent the adjacency matrix between response and predictor nodes that encodes the conditional dependence structure. This observation is especially important in biological settings when we have prior knowledge on the edges. Here, we propose an alternative model to the multiresponse linear regression whose solution yields a graph with edges that indeed represent conditional dependence. The solution to our model is sparse via Bayesian LASSO and is also guaranteed to be the sparse solution to Conditional Auto-Regressive (CAR) model. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting and compositional responses via appropriate hierarchical structure. Finally, we apply our model to a human gut and a soil microbial composition datasets.
翻译:推断图形结构时带有多个响应和预测器的节点,这是微生物、生态和遗传学广泛应用的基本统计问题。多反应线性回归模型似乎是一种直向的解决方案,但我们认为,将它作为图形模型处理是有缺陷的,应当谨慎行事,因为回归系数矩阵并不代表反应和预测节点之间的相近矩阵,而该矩阵编码是有条件依赖结构的。当我们事先掌握了边缘知识时,这一观察在生物环境中尤其重要。这里,我们建议了多反应线性回归的替代模型,其解决方案产生一个带有边缘的图,确实代表有条件依赖性的图。我们模型的解决方案通过巴耶斯山LASOSO稀有,而且保证它成为条件性自动反向模型的稀少的解决方案。此外,我们建议了适应性扩展,以便不同的边缘可以应用不同的缩放法来纳入特定边缘的先前知识。我们的模型通过高效的Gibs抽样算法计算成本较低,并且可以通过适当的等级结构来计算二进制、计和成式反应。最后,我们将我们的模型应用于人类直径和土壤数据。