Biological cells can be distinguished by their phenotype or at the molecular level, based on their genome, epigenome, and transcriptome. This paper focuses on the transcriptome, which encompasses all the RNA transcripts in a given cell population, indicating the genes being expressed at a given time. We consider single-cell RNA sequencing data and develop a novel model-based clustering method to group cells based on their transcriptome profiles. Our clustering approach takes into account the presence of zero inflation in the data, which can occur due to genuine biological zeros or technological noise. The proposed model for clustering involves a mixture of zero-inflated Poisson or zero-inflated negative binomial distributions, and parameter estimation is carried out using the EM algorithm. We evaluate the performance of our proposed methodology through simulation studies and analyses of publicly available datasets.
翻译:暂无翻译