Post-selection inference has recently been proposed as a way of quantifying uncertainty about detected changepoints. The idea is to run a changepoint detection algorithm, and then re-use the same data to perform a test for a change near each of the detected changes. By defining the p-value for the test appropriately, so that it is conditional on the information used to choose the test, this approach will produce valid p-values. We show how to improve the power of these procedures by conditioning on less information. This gives rise to an ideal selective p-value that is intractable but can be approximated by Monte Carlo. We show that for any Monte Carlo sample size, this procedure produces valid p-values, and empirically that noticeable increase in power is possible with only very modest Monte Carlo sample sizes. Our procedure is easy to implement given existing post-selection inference methods, as we just need to generate perturbations of the data set and re-apply the post-selection method to each of these. On genomic data consisting of human GC content, our procedure increases the number of significant changepoints that are detected from e.g. 17 to 27, when compared to existing methods.
翻译:最近有人提议选后推论,作为量化所检测到的变化点不确定性的一种方法。 其想法是运行一个变化点检测算法, 然后重新使用同样的数据来测试接近每个被检测到的变化。 通过适当定义测试的 p值, 从而以选择测试所使用的信息为条件, 这种方法将产生有效的 p 值。 我们用较少的信息来显示如何提高这些程序的力量。 这产生了一种理想的选择性的选择性 p值, 很难找到, 但蒙特卡洛可以接近。 我们显示, 对于任何蒙特卡洛样本大小, 这个程序都产生有效的 p 值, 并且从经验上说, 仅在非常小的蒙特卡洛 样本大小的情况下, 能够明显地增加权力。 我们的程序很容易执行, 以现有的选后推理方法为条件, 我们只需要对数据集进行干扰, 并将选后方法重新应用到其中的每一种。 在由人类GC内容构成的基因组数据上, 我们的程序增加了从现有方法17到27的显著变化点的数量, 比较时可以检测到现有方法。