Variable selection has been played a critical role in contemporary statistics and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but they mainly target at only one response. As more data being collected nowadays, it is common to obtain and analyze multiple correlated responses from the same study. Running separate regression for each response ignores their correlation thus multivariate analysis is recommended. Existing multivariate methods select variables related to all responses without considering the possible heterogeneous sparsity of different responses, i.e. some features may only predict a subset of responses but not the rest. In this paper, we develop a novel Bayesian indicator variable selection method in multivariate regression model with a large number of grouped predictors targeting at multiple correlated responses with possibly heterogeneous sparsity patterns. The method is motivated by the multi-trait fine mapping problem in genetics to identify the variants that are causal to multiple related traits. Our new method is featured by its selection at individual level, group level as well as specific to each response. In addition, we propose a new concept of subset posterior inclusion probability for inference to prioritize predictors that target at subset(s) of responses. Extensive simulations with varying sparsity and heterogeneity levels and dimension have shown the advantage of our method in variable selection and prediction performance as compared to existing general Bayesian multivariate variable selection methods and Bayesian fine mapping methods. We also applied our method to a real data example in imaging genetics and identified important causal variants for brain white matter structural change in different regions.
翻译:在当代统计和科学发现中,变量选择发挥了关键作用。 在过去20年中,为变量选择制定了许多正规化和贝叶西亚变量选择方法,但主要针对一个响应。由于现在正在收集更多的数据,因此通常从同一研究中获取和分析多个相关响应。对每个响应进行单独的回归忽略了它们的关联性,因此建议进行多变量分析。现有的多变量方法选择了与所有响应相关的变量,而没有考虑到不同响应的可能差异性,即有些特征可能只是预测一系列的结构性响应,而不是其余。在本文中,我们在多变量回归模型中开发了一个新型贝叶色指标选择变量选择方法,大量分组预测器针对多个关联性反应,可能具有多种差异性宽度模式。该方法的动因是多轨细图绘制问题,以确定与多个相关特性相关变量相关的变量。我们的新方法通过在单个级别、分组级别和每种响应中选择,我们提出了一个新的概念,即子宫内分数(子)纳入多个变量变量变量变量变量变量变量变量选择的概率概率,同时将我们的数据排序方法中,并展示了不同货币选择方法。