Grammar-based compression is a loss-less data compression scheme that represents a given string $w$ by a context-free grammar that generates only $w$. While computing the smallest grammar which generates a given string $w$ is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words $F_k$ can be completely characterized by RePair, where $F_k$ denotes the $k$-th Fibonacci word. Namely, all grammars for $F_k$ generated by any implementation of RePair are the smallest grammars for $F_k$, and no other grammars can be the smallest for $F_k$. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.
翻译:Larsson 和 Moffat 于1999年提议的基于语法的压缩是一个无损失的数据压缩方案,它代表一个特定的字符串,用一个不上下文的语法,只产生美元。在计算产生给定字符串的最小语法时,一般而言是NP硬的。在计算产生给定字符串的最小语法时,一般而言,许多基于多米时语法的压缩机实际上效果良好。Rarsson 和 Moffat 于1999年提出的RePair 是一个基于语法的压缩机,它反复取代字符串中最经常发生的大号的所有可能发生事件。由于最经常的大号可能有多种选择要替换,不同执行 RePair 的不同的语法可以产生不同的语法。在本文中,产生Formooncicle $Frormas 最小的语法可以完全由RePair 表示, $-krmair formax lenal formals lenal fimar lenal fimar 是我们最原始的Frimale fimar fimar fimar fimar le fimar fimar fimar len le len_fral len le le le lenal fimar fimar fimar 。