Although variable selection is one of the most popular areas of modern statistical research, much of its development has taken place in the classical paradigm compared to the Bayesian counterpart. Somewhat surprisingly, both the paradigms have focussed almost completely on linear models, in spite of the vast scope offered by the model liberation movement brought about by modern advancements in studying real, complex phenomena. In this article, we investigate general Bayesian variable selection in models driven by Gaussian processes, which allows us to treat linear, non-linear and nonparametric models, in conjunction with even dependent setups, in the same vein. We consider the Bayes factor route to variable selection, and develop a general asymptotic theory for the Gaussian process framework in the "large p, large n" settings even with p>>n, establishing almost sure exponential convergence of the Bayes factor under appropriately mild conditions. The fixed p setup is included as a special case. To illustrate, we apply our general result to variable selection in linear regression, Gaussian process model with squared exponential covariance function accommodating the covariates, and a first order autoregressive process with time-varying covariates. We also follow up our theoretical investigations with ample simulation experiments in the above regression contexts and variable selection in a real, riboflavin data consisting of 71 observations but 4088 covariates. For implementation of variable selection using Bayes factors, we develop a novel and effective general-purpose transdimensional, transformation based Markov chain Monte Carlo algorithm, which has played a crucial role in our simulated and real data applications.
翻译:虽然选择不同程度是现代统计研究中最受欢迎的领域之一,但其发展大多是在古典范式中进行的,与巴耶斯对口单位比较。有些令人惊讶的是,这两种范式几乎完全集中在线性模型上,尽管现代进步在研究真实复杂的现象方面带来的示范解放运动提供了广阔的范围。在本篇文章中,我们调查了由高山进程驱动的模式中巴耶斯的普通变量选择,这使我们能够在同样程度上处理线性、非线性和非参数性模型,同时处理甚至依附性的模型。我们认为贝伊因因因系数到变量选择的路径,并在“大p,大n”的设置中为高斯进程框架制定了一般的跨级理论,尽管现代进步在研究真实复杂的现象方面带来了巨大的范围,几乎可以肯定巴伊因斯因素在适当温和条件下的指数性融合。我们把总的结果应用到线性回归的变量选择中,高萨利亚进程模型与正向指数变异性模型结合,我们把第一种顺序的递增性理论性理论性理论性理论,我们用一个真实的模型来进行。