In this paper, we propose a new method for offline change-point detection on some parameters of the distribution of a random vector. We introduce a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast greedy binary splitting algorithm. We prove both algorithms converge almost surely to the set of change-points under very general assumptions on the distribution and independent sampling of the random vector. In particular, we show the assumptions leading to the consistency of the algorithms are satisfied by categorical and Gaussian random variables. This new approach is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analyzing single individuals and then combining the results, as is made nowadays in state-of-the-art approaches.
翻译:在本文中,我们建议了一种新的方法,用于对随机矢量分布的某些参数进行离线变化点探测。我们采用了一种惩罚性的最大可能性方法,可以通过动态编程算法或快速贪婪的二进制分算法来有效计算;我们证明这两种算法几乎肯定地在对随机矢量分布和独立抽样的非常一般假设下与一套变化点相融合;特别是,我们显示了导致算法一致性的假设是绝对的和高斯随机变量所满足的。这一新方法的动因是查明人口中个人基因组中的同系性岛屿的问题。我们的方法直接解决了在人口层面识别同系性岛屿的问题,而无需分析单个个人,然后按照当今最先进的方法将结果结合起来。