Graphical models are useful tools for describing structured high-dimensional probability distributions. Development of efficient algorithms for learning graphical models with least amount of data remains an active research topic. Reconstruction of graphical models that describe the statistics of discrete variables is a particularly challenging problem, for which the maximum likelihood approach is intractable. In this work, we provide the first sample-efficient method based on the Interaction Screening framework that allows one to provably learn fully general discrete factor models with node-specific discrete alphabets and multi-body interactions, specified in an arbitrary basis. We identify a single condition related to model parametrization that leads to rigorous guarantees on the recovery of model structure and parameters in any error norm, and is readily verifiable for a large class of models. Importantly, our bounds make explicit distinction between parameters that are proper to the model and priors used as an input to the algorithm. Finally, we show that the Interaction Screening framework includes all models previously considered in the literature as special cases, and for which our analysis shows a systematic improvement in sample complexity.
翻译:图形模型是描述结构化高维概率分布的有用工具。为学习图形模型而开发高效算法,用最少的数据来学习图形模型,这仍然是一个积极的研究课题。重建描述离散变量统计数据的图形模型是一个特别具有挑战性的问题,对此,最大可能性的方法是难以解决的。在这项工作中,我们提供了基于互动筛选框架的首个样本高效方法,该方法使人们可以随意地充分学习带有特定节点的离散字母和多体相互作用的完整通用离散系数模型。我们确定了一个与模型对称相关的单一条件,该条件导致对任何错误规范中模型结构和参数的恢复提供严格的保障,并且对大类模型的恢复很容易进行核查。重要的是,我们的界限对适合模型的参数和用于算法的先前参数作了明确区分。最后,我们表明,互动筛选框架包括了文献中以前视为特殊案例的所有模型,我们的分析显示样本复杂性的系统改进。