In an extant population, how much information do extant individuals provide on the pedigree of their ancestors? Recent work by Kim, Mossel, Ramnarayan and Turner (2020) studied this question under a number of simplifying assumptions, including random mating, fixed length inheritance blocks and sufficiently large founding population. They showed that under these conditions if the average number of offspring is a sufficiently large constant, then it is possible to recover a large fraction of the pedigree structure and genetic content by an algorithm they named REC-GEN. We are interested in studying the performance of REC-GEN on simulated data generated according to the model. As a first step, we improve the running time of the algorithm. However, we observe that even the faster version of the algorithm does not do well in any simulations in recovering the pedigree beyond 2 generations. We claim that this is due to the inbreeding present in any setting where the algorithm can be run, even on simulated data. To support the claim we show that a main step of the algorithm, called ancestral reconstruction, performs accurately in a idealized setting with no inbreeding but performs poorly in random mating populations. To overcome the poor behavior of REC-GEN we introduce a Belief-Propagation based heuristic that accounts for the inbreeding and performs much better in our simulations.
翻译:在现今人口中,现有个人能提供多少有关其祖先血统的资料?Kim、Mossel、Ramnarayan和Turner(2020年)最近根据一些简化假设研究了这一问题,包括随机交配、固定长度的继承区和足够庞大的创始人口。他们表明,在这些条件下,如果平均生子数量足够大,那么就有可能通过他们称为REC-GEN的算法来恢复大部分的儿科结构和遗传内容。我们有兴趣研究REC-GEN在根据模型产生的模拟数据方面的表现。作为第一步,我们改进算法的运行时间。然而,我们观察到,即使是较快的算法版本,在2代以上的小数恢复的任何模拟中也不会很好。我们声称,这要归功于任何算法可以运行的环境下的渗透,即使是模拟数据。为了支持我们的说法,我们展示了算法中的一个主要步骤,即祖传重建,是在一个理想的环境下进行精确地进行模拟,不甚深层次的演化,但我们没有进行更精确的模拟。我们发现,在进行更精确的变现后,我们没有进行更精确地进行更精确的变现的排序。