Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we investigate if autoencoders benefit from the TF-IDF transformation. Our results reveal that the TF-IDF transformation generally leads to improved clustering and biologically relevant feature extraction.
翻译:然而,由于数据具有高度的维度和广度,这一分析提出了挑战。已经开发了几种方法,包括时频反文件频率(TF-IDF)的转化技术、单值分解(SVD)等降低维度的方法、要素分析和自动编码器。然而,关于上述方法的全面研究尚未完全完成。在分析 ScATAC-Seq数据时,不清楚什么是最佳的做法。我们比较了几种变异和减少维度的设想以及基于SVD的特征分析,以调查ScATAC-Seq信息检索中可能的增强。此外,我们调查了自相电解析器是否从TF-IDF转换中受益。我们的结果显示,TF-IDF的转化通常导致改进集群和生物相关地貌提取。