Log-linear models are a family of probability distributions which capture relationships between variables. They have been proven useful in a wide variety of fields such as epidemiology, economics and sociology. The interest in using these models is that they are able to capture context-specific independencies, relationships that provide richer structure to the model. Many approaches exist for automatic learning of the independence structure of log-linear models from data. The methods for evaluating these approaches, however, are limited, and are mostly based on indirect measures of the complete density of the probability distribution. Such computation requires additional learning of the numerical parameters of the distribution, which introduces distortions when used for comparing structures. This work addresses this issue by presenting the first measure for the direct and efficient comparison of independence structures of log-linear models. Our method relies only on the independence structure of the models, which is useful when the interest lies in obtaining knowledge from said structure, or when comparing the performance of structure learning algorithms, among other possible uses. We present proof that the measure is a metric, and a method for its computation that is efficient in the number of variables of the domain.
翻译:逻辑线性模型是一系列概率分布的组合,可以捕捉变量之间的关系。在流行病学、经济学和社会学等广泛领域,这些模型已被证明是有用的。使用这些模型的兴趣在于它们能够捕捉到具体环境的相互依存关系,这些关系为模型提供了更丰富的结构。从数据中自动学习日志线性模型的独立结构有许多方法。但是,评估这些方法的方法有限,而且大多以概率分布的完整密度的间接度量为基础。这种计算需要额外学习分布的数值参数,这些参数在用于比较结构时会引入扭曲。这项工作通过提出直接和有效地比较日志线性模型独立结构的第一个尺度来解决这个问题。我们的方法仅依靠模型的独立结构,当有兴趣从上述结构获取知识时,或者当比较结构学习算法的性能时,这些方法就有用了。我们证明,这一尺度是一种衡量尺度,是计算其计算方法在域变量数量上是有效的。