High-dimensional data analysis typically focuses on low-dimensional structure, often to aid interpretation and computational efficiency. Graphical models provide a powerful methodology for learning the conditional independence structure in multivariate data by representing variables as nodes and dependencies as edges. Inference is often focused on individual edges in the latent graph. Nonetheless, there is increasing interest in determining more complex structures, such as communities of nodes, for multiple reasons, including more effective information retrieval and better interpretability. In this work, we propose a hierarchical graphical model where we first cluster nodes and then, at the higher level, investigate the relationships among groups of nodes. Specifically, nodes are partitioned into supernodes with a data-coherent size-biased tessellation prior which combines ideas from Bayesian nonparametrics and Voronoi tessellations. This construct also allows accounting for the dependence of nodes within supernodes. At the higher level, dependence structure among supernodes is modeled through a Gaussian graphical model, where the focus of inference is on superedges. We provide theoretical justification for our modeling choices. We design tailored Markov chain Monte Carlo schemes, which also enable parallel computations. We demonstrate the effectiveness of our approach for large-scale structure learning in simulations and a transcriptomics application.
翻译:暂无翻译