Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems arise in any application where signals cannot be perfectly localized, such as locating stars in astronomical surveys and changepoint detection in sequential data. Common Bayesian approaches to these problems involve computing a posterior distribution over signal locations. However, existing procedures to translate these posteriors into actual credible regions for the signals fail to capture all the information in the posterior, leading to lower power and (sometimes) inflated false discoveries. With this motivation, we introduce Bayesian Linear Programming (BLiP). Given a posterior distribution over signals, BLiP outputs credible regions for signals which verifiably nearly maximize expected power while controlling false positives. BLiP overcomes an extremely high-dimensional and nonconvex problem to verifiably nearly maximize expected power while controlling false positives. BLiP is very computationally efficient compared to the cost of computing the posterior and can wrap around nearly any Bayesian model and algorithm. Applying BLiP to existing state-of-the-art analyses of UK Biobank data (for genetic fine-mapping) and the Sloan Digital Sky Survey (for astronomical point source detection) increased power by 30-120% in just a few minutes of additional computation. BLiP is implemented in pyblip (Python) and blipr (R).
翻译:科学家往往必须同时本地化和发现信号。 例如,在基因精细绘图中,附近遗传变异体之间的高度关联使得很难确定因果关系变异体的确切位置。 因此,统计任务是输出尽可能多的不连接区域,尽可能小,尽可能包含信号,同时控制假正数。 在信号无法完全本地化的任何应用中,也会出现类似的问题,如在天文测量中定位恒星和在连续数据中检测变化点。 常见的贝叶斯处理这些问题的方法涉及在信号位置上计算一个后方分布。 但是,将这些后端转换为实际可信的区域的现有程序使得信号无法捕捉到海脊中的所有信息,导致电力下降和(有时)虚假的错误发现。有了这个动机,我们引入了巴伊西亚线性程序(BLiP) 。鉴于信号的映射分布无法完全本地化,BLiP输出可靠的信号区域,在控制假正数的图像模型中,BLiP克服了极高和非同化的问题,从而将预期的能量快速最大化,同时对BLi-L- 和B-L- 的错误的日历进行快速的计算。