A frequent challenge when using graphical models in applications is that the sample size is limited relative to the number of parameters to be learned. Our motivation stems from applications where one has external data, in the form of networks between variables, that provides valuable information to help improve inference. Specifically, we depict the relation between COVID-19 cases and social and geographical network data, and between stock market returns and economic and policy networks extracted from text data. We propose a graphical LASSO framework where likelihood penalties are guided by the external network data. We also propose a spike-and-slab prior framework that depicts how partial correlations depend on the networks, which helps interpret the fitted graphical model and its relationship to the network. We develop computational schemes and software implementations in R and probabilistic programming languages. Our applications show how incorporating network data can significantly improve interpretation, statistical accuracy, and out-of-sample prediction, in some instances using significantly sparser graphical models than would have otherwise been estimated.
翻译:在应用中使用图形模型时经常遇到的一个挑战是,相对于要学习的参数数量而言,样本规模有限。我们的动机来自各种应用,在应用中,我们拥有外部数据,以变量之间的网络形式提供有价值的信息,以帮助改进推论。具体地说,我们描述了COVID-19案例与社会和地理网络数据之间的关系,以及股票市场回报率与从文本数据中提取的经济和政策网络之间的关系。我们提出了一个图形LASSO框架,其中可能受到外部网络数据指导的惩罚。我们还提出了一个前框架,说明部分关联如何依赖于网络,帮助解释适合的图形模型及其与网络的关系。我们开发了R和概率性编程语言的计算计划和软件实施。我们的应用表明,将网络数据纳入网络数据可以大大改进解释、统计准确性和抽样预测,在有些情况下,使用比本来估计的要少得多的图形模型。