学习克隆森林 (Learning Clique Forests)

We propose a topological learning algorithm for the estimation of the conditional dependency structure of large sets of random variables from sparse and noisy data. The algorithm, named Maximally Filtered Clique Forest (MFCF), produces a clique forest and an associated Markov Random Field (MRF) by generalising Prim's minimum spanning tree algorithm. To the best of our knowledge, the MFCF presents three elements of novelty with respect to existing structure learning approaches. The first is the repeated application of a local topological move, the clique expansion, that preserves the decomposability of the underlying graph. Through this move the decomposability and calculation of scores is performed incrementally at the variable (rather than edge) level, and this provides better computational performance and an intuitive application of multivariate statistical tests. The second is the capability to accommodate a variety of score functions and, while this paper is focused on multivariate normal distributions, it can be directly generalised to different types of statistics. Finally, the third is the variable range of allowed clique sizes which is an adjustable topological constraint that acts as a topological penalizer providing a way to tackle sparsity at $l_0$ semi-norm level; this allows a clean decoupling of structure learning and parameter estimation. The MFCF produces a representation of the clique forest, together with a perfect ordering of the cliques and a perfect elimination ordering for the vertices. As an example we propose an application to covariance selection models and we show that the MCFC outperforms the Graphical Lasso for a number of classes of matrices.

翻译：我们提出一个地形学学习算法,用于估算来自分散和杂乱数据的大批随机变量的有条件依赖性结构。算法名为最大过滤的 clique Forest(MFCF),它通过对 Prim 的最小横贯树算法进行概观,产生一个球形森林和相关的Markov随机场(MRF)。据我们所知,MFCF 对现有结构学习方法提出了三个新颖因素。第一个因素是反复应用一个本地的地形移动,即球形变变变变的扩展,以保持基本图形的不兼容性。通过这个算法,将分数的不相容和计算在变量(而不是边缘)级别上进行递增,这提供了更好的计算性业绩和多变量统计测试的直观应用。第二个因素是适应各种评分函数的能力,而本文则侧重于多变的正常分布,可以直接概括到不同的统计类型。最后,第三个因素是允许的圆形大小的变量范围,通过这个范围来显示我们可调整的直径变的直径直值应用度和计分数的分数。 Astologal cloal- cloalalalal-lialalal-lialalalal-lial-listrational roducolational rodu化,这个结构结构结构结构结构结构结构结构结构是用来在上显示一种最上进行一个最精确的精确的精确的精确的排序的精确结构,使最高级结构的精确结构结构结构结构的精确结构结构结构结构的精确级结构的精确度,以学的精确化结构结构结构结构结构学的精确化结构的精确化结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构,以演化结构的精确学,以演。