Background: The nucleus of eukaryotic cells spatially packages chromosomes into a hierarchical and distinct segregation that plays critical roles in maintaining transcription regulation. High-throughput methods of chromosome conformation capture, such as Hi-C, have revealed topologically associating domains (TADs) that are defined by biased chromatin interactions within them. Results: Here, we introduce a novel method, HiCKey, to decipher hierarchical TAD structures in Hi-C data and compare them across samples. We first derive a generalized likelihood-ratio (GLR) test for detecting change-points in an interaction matrix that follows a negative binomial distribution or general mixture distribution. We then employ several optimal search strategies to decipher hierarchical TADs with p-values calculated by the GLR test. Large-scale validations of simulation data show that HiCKey has good precision in recalling known TADs and is robust against random collision noise of chromatin interactions. By applying HiCKey to Hi-C data of seven human cell lines, we identified multiple layers of TAD organization among them, but the vast majority had no more than four layers. In particular, we found that TAD boundaries are significantly enriched in active chromosomal regions compared to repressed regions, indicating finer hierarchical architectures in active regions for precise gene transcription regulation. Conclusions: HiCKey is optimized for processing large matrices constructed from high-resolution Hi-C experiments. The method and theoretical result of the GLR test provide a general framework for significance testing of similar experimental chromatin interaction data that may not fully follow negative binomial distributions but rather more general mixture distributions.
翻译:背景 : eukarycool 细胞空间包件染色体的核心, 形成等级和独特的隔离核心, 在维持正数调节方面起着关键作用。 高通量染色体相匹配捕获方法, 例如 Hi- C, 揭示了由偏差染色体相互作用定义的表层联系域( TADs ) 。 结果 : 在这里, 我们引入了一种新颖的方法, 即 HICKey, 在 Hi- C 数据中解译等级的 TAD 结构, 并在多个样本中比较。 我们首先从一个互动矩阵中获取一个总的可能性- RATIO (GLR) 测试, 以探测到一个互动基数点的变化点。 我们发现TAD的多层, 以GLR测试的 p- 值计算。 大规模模拟数据验证显示, HICKey 在回顾已知的TAD 数据时, 强力地, 防止chromatro 互动的随机碰撞噪音。 我们从七个人类单元格线的 HIC 数据中, 我们从一个高层次中找出了TAD 组织的多层, 测试区域, 高层次中发现, 高压区域 高压区域 。