In multivariate data analysis, it is often important to estimate a graph characterizing dependence among (p) variables. A popular strategy uses the non-zero entries in a (p\times p) covariance or precision matrix, typically requiring restrictive modeling assumptions for accurate graph recovery. To improve model robustness, we instead focus on estimating the {\em backbone} of the dependence graph. We use a spanning tree likelihood, based on a minimalist graphical model that is purposely overly-simplified. Taking a Bayesian approach, we place a prior on the space of trees and quantify uncertainty in the graphical model. In both theory and experiments, we show that this model does not require the population graph to be a spanning tree or the covariance to satisfy assumptions beyond positive-definiteness. The model accurately recovers the backbone of the population graph at a rate competitive with existing approaches but with better robustness. We show combinatorial properties of the spanning tree, which may be of independent interest, and develop an efficient Gibbs sampler for Bayesian inference. Analyzing electroencephalography data using a Hidden Markov Model with each latent state modeled by a spanning tree, we show that results are much more interpretable compared with popular alternatives.
翻译:在多变量数据分析中,通常有必要估计一个图形,说明(p)变量之间的依赖性。流行战略使用(p\times p) 共变量或精确矩阵中的非零条目,通常需要限制性模型假设,以便精确图形恢复。为了提高模型的稳健性,我们侧重于估算依赖图的 ~em 脊柱} 。我们使用一个以最小图形模型为基础的覆盖树的可能性,该模型有意地过于简单化。采取巴耶西亚方法,我们先在树空间上放置一个位置,并在图形模型中量化不确定性。在理论和实验中,我们显示该模型并不要求人口图是一个横跨树或共变量,以满足超出正定义的假设。模型准确地将人口图的脊椎恢复到与现有方法相比具有竞争力但更强的速率。我们展示了横贯树的组合特性,这可能具有独立的兴趣,并为Bayesian 的推断开发一个高效的 Gibs 样本。在理论和实验中,我们显示,使用隐蔽Markov 模型进行分析的数据,我们通过对每个树进行更深层的模型进行对比的模拟结果进行分析。