Asymmetric Numeral Systems (ANS) is a class of entropy encoders that had an immense impact on the data compression, substituting arithmetic and Huffman coding. It was studied by different authors but the precise asymptotics of its redundancy (in relation to the entropy) was not completely understood. We obtain optimal bounds for the redundancy of the tabled ANS (tANS), the most popular ANS variant. Given a sequence $a_1,a_2,\ldots,a_n$ of symbols from an alphabet $\{0,1,\ldots,\sigma-1\}$ such that each symbol $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of the following length (the frequencies are not included in the encoding): $\sum\limits_{a\in[0..\sigma)}f_a\cdot\log\frac{n}{f_a}+O(\sigma+r)$, where $O(\sigma+r)$ can be bounded by $\sigma\log e+r$. The $r$-bit term is an artifact indispensable to ANS; the rest incurs a redundancy of $O(\frac{\sigma}{n})$ bits per symbol. We complement this by examples showing that an $\Omega(\sigma+r)$ redundancy is necessary. We argue that similar examples exist for most adequate initialization methods for tANS. Thus, we refute Duda's conjecture that the redundancy is $O(\frac{\sigma}{n^2})$ bits per symbol. We also propose a variant of the range ANS (rANS), called rANS with fixed accuracy, parameterized by $k\ge 1$. In this variant the integer division, which is unavoidable in rANS, is performed only when its result belongs to $[2^k..2^{k+1})$. Therefore, the division can be computed by faster methods provided $k$ is small. We bound the redundancy for our rANS variant by $\frac{n}{2^k-1}\log e+r$.
翻译:暂无翻译