In this paper, we study the problem of inferring spatially-varying Gaussian Markov random fields (SV-GMRF) where the goal is to learn a network of sparse, context-specific GMRFs representing network relationships between genes. An important application of SV-GMRFs is in inference of gene regulatory networks from spatially-resolved transcriptomics datasets. The current work on inference of SV-GMRFs are based on the regularized maximum likelihood estimation (MLE) and suffer from overwhelmingly high computational cost due to their highly nonlinear nature. To alleviate this challenge, we propose a simple and efficient optimization problem in lieu of MLE that comes equipped with strong statistical and computational guarantees. Our proposed optimization problem is extremely efficient in practice: we can solve instances of SV-GMRFs with more than 2 million variables in less than 2 minutes. We apply the developed framework to study how gene regulatory networks in Glioblastoma are spatially rewired within tissue, and identify prominent activity of the transcription factor HES4 and ribosomal proteins as characterizing the gene expression network in the tumor peri-vascular niche that is known to harbor treatment resistant stem cells.
翻译:在本文中,我们研究了空间变化的Gaussian Markov随机字段(SV-GMRF)问题,目的是学习一个代表基因之间网络关系的分散、因地制宜的GMRF网络。SV-GMRF的一个重要应用是从空间溶解的转录缩缩记式数据集推断基因调控网络。目前关于SV-GMRF的推断工作基于固定化的最大可能性估计值(MLE),并且由于高度非线性性质而承受极高的计算成本。为了减轻这一挑战,我们提出了一个简单有效的优化问题,以取代配备强有力的统计和计算保证的MLEMLE。我们提议的优化问题在实践中极为有效:我们可以在不到2分钟的时间里解决有200万个以上变量的SV-GMRF的事例。我们运用了发达的框架来研究Glioblastuma的基因调控网是如何在组织内进行空间再接线的,并查明在HES4和REMOAM蛋白质的转调因子细胞的突出的活动,这是人们所知道的基因表达的网络。