We consider the problem of learning a tree-structured Ising model from data, such that subsequent predictions computed using the model are accurate. Concretely, we aim to learn a model such that posteriors $P(X_i|X_S)$ for small sets of variables $S$ are accurate. Since its introduction more than 50 years ago, the Chow-Liu algorithm, which efficiently computes the maximum likelihood tree, has been the benchmark algorithm for learning tree-structured graphical models. A bound on the sample complexity of the Chow-Liu algorithm with respect to the prediction-centric local total variation loss was shown in [BK19]. While those results demonstrated that it is possible to learn a useful model even when recovering the true underlying graph is impossible, their bound depends on the maximum strength of interactions and thus does not achieve the information-theoretic optimum. In this paper, we introduce a new algorithm that carefully combines elements of the Chow-Liu algorithm with tree metric reconstruction methods to efficiently and optimally learn tree Ising models under a prediction-centric loss. Our algorithm is robust to model misspecification and adversarial corruptions. In contrast, we show that the celebrated Chow-Liu algorithm can be arbitrarily suboptimal.
翻译:我们考虑了从数据中学习树结构的Ising模型的问题,因此,随后使用模型计算出的预测是准确的。具体地说,我们的目标是学习一个模型,这样,小数变量的后端$P(X_i ⁇ X_S)美元是准确的。自50多年前引入以来,高效计算最大可能性树的周露算法一直是学习树结构图形模型的基准算法。Chow-Liu算法与以预测为中心的本地总变差损失的Chow-Liu算法的抽样复杂性在[BK19]中显示了。这些结果表明,即使在不可能恢复真实的基本图时,也有可能学习一个有用的模型,但它们的界限取决于互动的最大强度,因此无法实现信息理论的最佳性。在本文中,我们引入了一种新的算法,仔细地将Chow-Li算法的要素与树结构的重建方法结合起来,以便在以预测为中心的损失中高效和最佳地学习树结构的树结构模式。我们的算法可以强有力地对模型进行辨别和对抗性分析。对比,我们可展示的是,我们是如何比较的。