Consider the normal linear regression setup when the number of covariates p is much larger than the sample size n, and the covariates form correlated groups. The response variable y is not related to an entire group of covariates in all or none basis, rather the sparsity assumption persists within and between groups. We extend the traditional g-prior setup to this framework. Variable selection consistency of the proposed method is shown under fairly general conditions, assuming the covariates to be random and allowing the true model to grow with both n and p. For the purpose of implementation of the proposed g-prior method to high-dimensional setup, we propose two procedures. First, a group screening procedure, termed as group SIS (GSIS), and secondly, a novel stochastic search variable selection algorithm, termed as group informed variable selection algorithm (GiVSA), which uses the known group structure efficiently to explore the model space without discarding any covariate based on an initial screening. Screening consistency of GSIS, and theoretical mixing time of GiVSA are studied using the canonical path ensemble approach of Yang et al. (2016). Performance of the proposed prior with implementation of GSIS as well as GiVSA are validated using various simulated examples and a real data related to residential buildings.
翻译:暂无翻译