While single-cell RNA-seq enables the investigation of the celltype effect on the transcriptome, the pure tissue environmental effect has not been well investigated. The bias in the combination of tissue and celltype in the body made it difficult to evaluate the effect of pure tissue environment by omics data mining. It is important to prevent statistical confounding among discrete variables such as celltype, tissue, and other categorical variables when evaluating the effects of these variables. We propose a novel method to enumerate suitable analysis units of variables for estimating the effects of tissue environment by extending the maximal biclique enumeration problem for bipartite graphs to $k$-partite hypergraphs. We applied the proposed method to a large mouse single-cell transcriptome dataset of Tabala Muris Senis to evaluate pure tissue environmental effects on gene expression. Data Mining using the proposed method revealed pure tissue environment effects on gene expression and its age-related change among adipose sub-tissues. The method proposed in this study helps evaluations of the effects of discrete variables in exploratory data mining of large-scale genomics datasets.
翻译:暂无翻译