Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way approximating probabilities with powers-of-2. Operating on more accurate probabilities usually allows for better compression ratios, and is possible e.g. using arithmetic coding and Asymmetric Numeral Systems family. Especially the tabled variant of the latter (tANS) often replaces Huffman coding due to better compression at similar computational cost - e.g. in Facebook Zstandard and Apple LZFSE popular compressors. There is discussed encoding of probability distributions for this kind of applications, especially using Pyramid Vector Quantizer(PVQ)-based approach with deformation, also tuned symbol spread for tANS.
翻译:许多数据压缩器定期编码用于加密编码的概率分布 - 需要最短描述的优化类型。 Canonical priix/Huffman 编码通常只是写出比特序列的长度, 以这种方式接近2号功率的概率。 在更精确的概率情况下操作通常可以改善压缩比率, 并且有可能使用算术编码和亚称数字数字系统组。 特别是后一种变式( tANS) 通常取代Huffman 编码, 这是因为在类似的计算成本下压缩更好了, 例如在Facebook Zstand 和 Apple LZFSE 流行压缩器中。 讨论过这种应用的概率分布的编码, 特别是使用Pyramid Victor Quatizer (PVQ) 和变形法, 也对 TANS 的符号进行调控。