In applications such as gene regulatory network analysis based on single-cell RNA sequencing data, samples often come from a mixture of different populations and each population has its own unique network. Available graphical models often assume that all samples are from the same population and share the same network. One has to first cluster the samples and use available methods to infer the network for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate network estimation. Motivated by these applications, we consider the mixture Poisson log-normal model for network inference of count data from mixed populations. The latent precision matrices of the mixture model correspond to the networks of different populations and can be jointly estimated by maximizing the lasso-penalized log-likelihood. Under rather mild conditions, we show that the mixture Poisson log-normal model is identifiable and has the positive definite Fisher information matrix. Consistency of the maximum lasso-penalized log-likelihood estimator is also established. To avoid the intractable optimization of the log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real single-cell RNA sequencing data analyses demonstrate the superior performance of VMPLN.
翻译:在根据单细胞RNA测序数据进行的基因管理网络分析等应用中,样本往往来自不同人群的混合体,每个人群都有自己的独特的网络。可用的图形模型往往假定所有样本都是来自同一人群的,并共用同一网络。首先必须将样本分组,并使用可用方法分别推断每个组群的网络。然而,这一两步程序忽略了集群步骤的不确定性,从而可能导致不准确的网络估计。受这些应用的驱动,我们认为混合人群计算数据网络推断的混合Poisson逻辑常态模型。混合物模型的潜在精密矩阵与不同人群的网络相对应,并且可以通过最大限度地利用拉索的对数对日志相似性进行联合估算。在相当温和的条件下,我们表明混合的Poisson逻辑常态模型是可识别的,并且具有积极的固定的渔业信息矩阵。还建立了最大按lasso对日志相似性测算法的一致性模型。为了避免对日志相似性数据进行精准的优化优化优化,我们开发了名为VMPN的算法,在VMPL进行真实的测序分析时,并展示了VMPL的精确性测算方法。