通过 Minipatch 集合快速和精确的巨集数据图表学习 (Fast and Accurate Graph Learning for Huge Data via Minipatch Ensembles)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Gaussian graphical models provide a powerful framework for uncovering conditional dependence relationships between sets of nodes; they have found applications in a wide variety of fields including sensor and communication networks, physics, finance, and computational biology. Often, one observes data on the nodes and the task is to learn the graph structure, or perform graphical model selection. While this is a well-studied problem with many popular techniques, there are typically three major practical challenges: i) many existing algorithms become computationally intractable in huge-data settings with tens of thousands of nodes; ii) the need for separate data-driven hyperparameter tuning considerably adds to the computational burden; iii) the statistical accuracy of selected edges often deteriorates as the dimension and/or the complexity of the underlying graph structures increase. We tackle these problems by developing the novel Minipatch Graph (MPGraph) estimator. Our approach breaks up the huge graph learning problem into many smaller problems by creating an ensemble of tiny random subsets of both the observations and the nodes, termed minipatches. We then leverage recent advances that use hard thresholding to solve the latent variable graphical model problem to consistently learn the graph on each minipatch. Our approach is computationally fast, embarrassingly parallelizable, memory efficient, and has integrated stability-based hyperparamter tuning. Additionally, we prove that under weaker assumptions than that of the Graphical Lasso, our MPGraph estimator achieves graph selection consistency. We compare our approach to state-of-the-art computational approaches for Gaussian graphical model selection including the BigQUIC algorithm, and empirically demonstrate that our approach is not only more statistically accurate but also extensively faster for huge graph learning problems.

翻译：高斯图形模型为发现各节点之间的有条件依赖关系提供了一个强大的框架。它们发现在各种各样的领域, 包括传感器和通信网络、物理、金融、计算生物学, 都存在不同的应用。通常, 人们观察节点的数据, 任务通常是学习图形结构, 或进行图形模型选择。虽然这是许多流行技术中研究周密的问题, 但通常存在三大实际挑战 : i) 许多现有的算法在巨型数据环境中变得难以计算, 有成千上万个节点; ii 需要由数据驱动的超分计调整, 大大增加计算负担; iii 某些边缘的统计精确度往往随着基本图形结构的尺寸和(或)复杂性的增加而恶化。我们通过开发新颖的 Minipatch 图表(MPGraph) 估测仪来解决这些问题。我们的方法将巨大的图表学习问题分解成许多较小的问题, 创建一个小的随机子集, 观察和节点, 称为微型节点。然后我们利用最近的进展模型, 使用硬门槛的精确度 Q- ralalalal lialalalalalalalalalalalalalalal 。