Community detection (CD) algorithms are applied to Hi-C data to discover new communities of loci in the 3D conformation of human and mouse DNA. We find that CD has some distinct advantages over pre-existing methods: (1) it is capable of finding a variable number of communities, (2) it can detect communities of DNA loci either adjacent or distant in the 1D sequence, and (3) it allows us to obtain a principled value of k, the number of communities present. Forcing k = 2, our method recovers earlier findings of Lieberman-Aiden, et al. (2009), but letting k be a parameter, our method obtains as optimal value k = 6, discovering new candidate communities. In addition to discovering large communities that partition entire chromosomes, we also show that CD can detect small-scale topologically associating domains (TADs) such as those found in Dixon, et al. (2012). CD thus provides a natural and flexible statistical framework for understanding the folding structure of DNA at multiple scales in Hi-C data.
翻译:社区检测(CD)算法适用于Hi-C数据,以发现人类和鼠标DNA3D相配的3D相配中新的定位群落。我们发现,CD比原有方法具有一些明显的优势:(1)它能够找到数量不等的社区,(2)它能够在1D序列中探测相邻或相距遥远的DNA群落,(3)它使我们能够获得k的原则值,即在场社区的数量。强制 k = 2,我们的方法恢复了Lieberman-Aiden等人(2009年)的早期发现,但让 k成为参数,我们的方法获得了最佳值 k =6,发现新的候选社区。除了发现分布整个染色体的大型社区外,我们还表明CD能够探测到像狄克逊等人(2012年)发现的小规模地貌联系域。因此CD提供了一个自然和灵活的统计框架,以了解Hi-C数据中多重尺度的DNA折叠结构。