Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way approximating probabilities with powers-of-2. Operating on more accurate probabilities usually allows for better compression ratios, and is possible e.g. using arithmetic coding and Asymmetric Numeral Systems family. Especially the multiplication-free tabled variant of the latter (tANS) builds automaton often replacing Huffman coding due to better compression at similar computational cost - e.g. in popular Facebook Zstandard and Apple LZFSE compressors. There is discussed encoding of probability distributions for such applications, especially using Pyramid Vector Quantizer(PVQ)-based approach with deformation, bucket approximation, prefix trees, improving accuracy with additional bits, also tuned symbol spread for tANS.
翻译:许多数据压缩器定期编码对加密编码的概率分布, 需要最小描述的优化类型 。 加农学前缀/ Huffman 编码通常只是写出比特序列的长度, 以这种方式接近2号功率的概率。 更精确的概率操作通常可以改善压缩比率, 并且有可能使用算术编码和 Asymit Numeral Systems 等方法。 特别是后者的免倍变式( tANS) 建立自动成像, 通常取代赫夫曼的编码, 因为以类似的计算成本进行更好的压缩, 例如在流行的 Facebook Z Standard 和 苹果 LZFSE 压缩器中。 讨论过这些应用的概率分布的编码, 特别是使用 Pyramid 矢量 Quantizer (PVQ) 方法, 包括变形、 桶近近、 前方形树、 提高精度, 使用额外比特的精度, 以及 TANS 的调符号 。