Chromatin immunoprecipitation, followed by high throughput sequencing provides vital insights into locations on the genome with differential DNA occupancy between experimental states. However, since ChIP-Seq data is collected experimentally, it must be normalized between samples in order to properly assess which genomic regions have differential DNA occupancy via differential binding analysis. While between-sample normalization is a crucial for downstream differential binding analysis, the technical conditions underlying between-sample ChIP-Seq normalization methods have yet to be specifically examined. We identify three important technical conditions underlying ChIP-Seq between-sample normalization methods: symmetric differential DNA occupancy, equal total DNA occupancy, and equal background binding across experimental states. We categorize popular ChIP-Seq normalization methods based on their technical conditions and simulate ChIP-Seq read count data to exemplify the importance of satisfying a normalization method's technical conditions to downstream differential binding analysis. We assess the similarity between normalization methods in experimental CUT&RUN data to externally verify our simulation findings. Our simulation and experimental results underscore that satisfying the technical conditions underlying the selected between-sample normalization methods is crucial to conducting biologically meaningful downstream differential binding analysis. We suggest that researchers use their understanding of the ChIP-Seq experiment at hand to guide their choice of between-sample normalization method when possible. Researchers could use the intersection of the differentially bound peaksets derived from different normalization methods to determine which regions have differential DNA occupancy between experimental states when there is uncertainty about which technical conditions are met.
翻译:暂无翻译