This article proposes post-selective inference for Gaussian models via approximate maximum likelihood. Our proposal serves two key goals: (i) efficient utility of hold-out information from selection by exploiting randomization; (ii) computational ease by bypassing expensive MCMC samplers from intractable conditional distributions. At the core of our method is the solution to a convex optimization problem that assumes a separable form across multiple learning queries during selection. Our proposal allows us to tackle efficient and tractable inference in many practical scenarios where more than one query informs inference. We illustrate the potential of our approximate method in comparisons with existing strategies across wide ranging signal-to-noise regimes and on gene expression data from TCGA (The Cancer Genome Atlas).
翻译:本文通过大概最大可能性为高斯模型提出后选择性推论。我们的提案有两个关键目标:(一) 利用随机化,有效地利用从选择中选择的搁置信息;(二) 绕过昂贵的MCMC采样器,绕过棘手的有条件分布,从而便于计算。我们方法的核心是解决在选择期间多个学习查询中以可分离的形式呈现的曲线优化问题。我们的提案使我们能够在许多实际假设中处理高效和可移植的推论,在这些假设中,不止一个查询可以得出推论。我们举例说明了我们与广泛信号到噪音系统和TCGA(癌症基因组图集)的基因表达数据的现有战略进行比较的近似方法的潜力。