Integrating data from different platforms, such as bulk and single-cell RNA sequencing, is crucial for improving the accuracy and interpretability of complex biological analyses like cell type deconvolution. However, this task is complicated by measurement and biological heterogeneity between target and reference datasets. For the problem of cell type deconvolution, existing methods often neglect the correlation and uncertainty in cell type proportion estimates, possibly leading to an additional concern of false positives in downstream comparisons across multiple individuals. We introduce MEAD, a comprehensive statistical framework that not only estimates cell type proportions but also provides asymptotically valid statistical inference on the estimates. One of our key contributions is the identifiability result, which rigorously establishes the conditions under which cell type proportions are identifiable despite arbitrary heterogeneity of measurement biases between platforms. MEAD also supports the comparison of cell type proportions across individuals after deconvolution, accounting for gene-gene correlations and biological variability. Through simulations and real-data analysis, MEAD demonstrates superior reliability for inferring cell type compositions in complex biological systems.
翻译:暂无翻译