Despite extensive progress on image generation, deep generative models are suboptimal when applied to lossless compression. For example, models such as VAEs suffer from a compression cost overhead due to their latent variables that can only be partially eliminated with elaborated schemes such as bits-back coding, resulting in oftentimes poor single-sample compression rates. To overcome such problems, we establish a new class of tractable lossless compression models that permit efficient encoding and decoding: Probabilistic Circuits (PCs). These are a class of neural networks involving $|p|$ computational units that support efficient marginalization over arbitrary subsets of the $D$ feature dimensions, enabling efficient arithmetic coding. We derive efficient encoding and decoding schemes that both have time complexity $\mathcal{O} (\log(D) \cdot |p|)$, where a naive scheme would have linear costs in $D$ and $|p|$, making the approach highly scalable. Empirically, our PC-based (de)compression algorithm runs 5-20x faster than neural compression algorithms that achieve similar bitrates. By scaling up the traditional PC structure learning pipeline, we achieved state-of-the-art results on image datasets such as MNIST. Furthermore, PCs can be naturally integrated with existing neural compression algorithms to improve the performance of these base models on natural image datasets. Our results highlight the potential impact that non-standard learning architectures may have on neural data compression.
翻译:尽管在图像生成上取得了广泛进展,但深重基因模型在应用到无损压缩时并不最优化。例如,VAEs等模型由于潜在的变数,只能部分地用比特背编码等精心设计的方案消除,因此只能部分地消除这些变数,结果往往造成单一模版压缩率低。为了克服这些问题,我们建立了一个新的可移动的无损压缩模型类别,允许高效编码和解码:概率曲线(PCs),这是一套神经网络,涉及$ ⁇ p ⁇ $计算单位,用以支持在美元D$特性层面的任意子集上有效边缘化,从而促成高效的算术编码。我们制定了高效的编码和解码计划,既具有时间复杂性的$\mathcal{O}(log(D)\cdot ⁇ p ⁇ }(美元)美元),而天真的计划可能具有线性成本,用美元和$ ⁇ craliscruple(美元),使方法具有高度可缩略性。从本质上看,我们的PC-抑制性(decommainal)算算算法在5-20-20xn remailalalalalalalalal realalalalal real real real real realalalal ressalalalalalalalalmaxal resslation resslationsmax ress ress lax lax。我们实现了算算算算算方法可以比等数据,从而实现类似的模型。