Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows-Wheeler transform (RLBWT), one of the most flexible approaches to representing and processing highly compressible strings. The main idea is to represent a text as a context-free grammar whose language is precisely the input string. This is called a straight-line grammar (SLG). An AVL grammar, proposed by Rytter [Theor. Comput. Sci., 2003] is a type of SLG that additionally satisfies the AVL-property: the heights of parse-trees for children of every nonterminal differ by at most one. In contrast to other SLG constructions, AVL grammars can be constructed from the LZ77 parsing in compressed time: $\mathcal{O}(z \log n)$ where $z$ is the size of the LZ77 parsing and $n$ is the length of the input text. Despite these advantages, AVL grammars are thought to be too large to be practical. We present a new technique for rapidly constructing a small AVL grammar from an LZ77 or LZ77-like parse. Our algorithm produces grammars that are always at least five times smaller than those produced by the original algorithm, and never more than double the size of grammars produced by the practical Re-Pair compressor [Larsson and Moffat, Proc. IEEE, 2000]. Our algorithm also achieves low peak RAM usage. By combining this algorithm with recent advances in approximating the LZ77 parsing, we show that our method has the potential to construct a run-length BWT from an LZ77 parse in about one third of the time and peak RAM required by other approaches. Overall, we show that AVL grammars are surprisingly practical, opening the door to much faster construction of key compressed data structures.
翻译:语法压缩是 Lempel- Ziv (LZ77) 和 运行长的 Burrows- Wheeler 变换 (RLBWT) 的旁边, 是代表并处理高度压缩的字符串的最灵活的方法之一。 主要的想法是将文字代表为无上下文语法的语法。 这被称为直线语法( SLG) 。 由 Rytter [Theor. Computut. Sci. 2003] 提议的 AVLL 语法。 它是一个SLG 的种类, 更能满足 AVL- 礼仪的变换(RLBBRRR) : 每个非终点的孩子们的开价高。 与 SLG 构造相反, AVL 语法可以在压缩时间里从 LZ77 中构造一个直线线式语法。 $\ call krammar {(z\ log n n) 美元是late to the lZ77 listal deal pral pral dreal) 和 pral pral tal.