We study frequentist properties of Bayesian and $L_0$ model selection, with a focus on (potentially non-linear) high-dimensional regression. We propose a construction to study how posterior probabilities and normalized $L_0$ criteria concentrate on the (Kullback-Leibler) optimal model and other subsets of the model space. When such concentration occurs, one also bounds the frequentist probabilities of selecting the correct model, type I and type II errors. These results hold generally, and help validate the use of posterior probabilities and $L_0$ criteria to control frequentist error probabilities associated to model selection and hypothesis tests. Regarding regression, we help understand the effect of the sparsity imposed by the prior or the $L_0$ penalty, and of problem characteristics such as the sample size, signal-to-noise, dimension and true sparsity. A particular finding is that one may use less sparse formulations than would be asymptotically optimal, but still attain consistency and often also significantly better finite-sample performance. We also prove new results related to misspecifying the mean or covariance structures, and give tighter rates for certain non-local priors than currently available.
翻译:我们研究贝叶西亚和0美元模型选择的常客特性, 重点是( 可能非线性) 高维回归。 我们建议进行一项工程, 研究后生概率和常规值的0.0美元标准如何集中于模型空间的最佳模型和其他子集。 当这种集中发生时, 也会限制选择正确模型、 类型I 和 类型II 错误的常客概率。 这些结果一般维持不变, 有助于验证后生概率和值为0.0的标准的使用, 以控制与模型选择和假设测试相关的常犯错误概率。 关于回归, 我们帮助理解先前或0.0美元的罚款所施加的不稳定性的影响, 以及样本大小、 信号到噪音、 尺寸 和真实的紧张性等问题特性。 一个特别的发现是, 使用稀释性制剂的概率可能小过低, 但仍然达到一致性, 并且常常大大改进与模型选择模型和假设值测试相关的经常值概率。 关于回归, 我们还证明, 与某些现有不变性或更精确性结构有关。