Probabilistic Circuits (PCs) are a general and unified computational framework for tractable probabilistic models that support efficient computation of various inference tasks (e.g., computing marginal probabilities). Towards enabling such reasoning capabilities in complex real-world tasks, Liu et al. (2022) propose to distill knowledge (through latent variable assignments) from less tractable but more expressive deep generative models. However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. Therefore, instead of performing distillation from the most expressive deep generative model, we study what properties the teacher model and the PC should have in order to achieve good distillation performance. This leads to a generic algorithmic improvement as well as other data-type-specific ones over the existing latent variable distillation pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models (Kingma et al., 2021).
翻译:预测概率电路(PCs)是支持高效计算各种推论任务(例如计算边际概率)的可移植概率模型的一般和统一的计算框架,有助于高效计算各种推论任务(例如,计算边际概率)。为了在复杂的现实世界任务中促成这种推理能力,刘等人(2022年)提议(通过潜伏变量分配)从不那么可移植但更清晰的深层基因化模型中蒸馏知识(通过潜伏变量分配),然而,还不清楚是什么因素使这种蒸馏工作效果良好。在本文中,我们在理论上和经验上发现,一个PC的性能可以超过其教师模型的性能。因此,我们不是从最显眼的深层基因化模型中进行蒸馏,而是研究教师模型和PC应具备哪些属性,以便取得良好的蒸馏性业绩。这导致对现有的潜伏变量蒸馏管道进行一般的算法改进,以及其他数据类型特定模型。但从设计上看,我们比STA TPMs高出很大的图像建模基准。特别是,在图像网络32上,PCs PCs 将实现4.06比P-per-cal-slievorization 和al-sion) 仅次于0.34。