We revisit the problem of minimal local grammar-based coding. In this setting, the local grammar encoder encodes grammars symbol by symbol, whereas the minimal grammar transform minimizes the grammar length in a preset class of grammars as given by the length of local grammar encoding. It is known that such minimal codes are strongly universal for a strictly positive entropy rate, whereas the number of rules in the minimal grammar constitutes an upper bound for the mutual information of the source. Whereas the fully minimal code is likely intractable, the constrained minimal block code can be efficiently computed. In this note, we present a new, simpler, and more general proof of strong universality of the minimal block code, regardless of the entropy rate. The proof is based on a simple Zipfian bound for ranked probabilities. By the way, we also show empirically that the number of rules in the minimal block code cannot clearly discriminate between long-memory and memoryless sources, such as a text in English and a random permutation of its characters. This contradicts our previous expectations.
翻译:我们重新审视了以语法为基础的本地最小编码问题。 在此设置中, 本地语法编码编码将语法符号编码成符号, 而最小语法变换将本地语法编码长度给定的预设语法分类中的语法长度最小化。 众所周知, 这种最低代码对于绝对正数的加密率非常普遍, 而最低语法规则的数量对于源的相互信息来说是上界的。 虽然完全最低代码可能是难以操作的, 但有限的最小区块编码是可以有效计算的。 在本说明中, 我们提出了一个新的、 更简单、 更笼统的证据证明最小区块编码具有很强的普遍性, 不论英译率如何 。 证据基于简单的Zipfian 约束的概率等级。 顺便说一句, 我们还从经验上表明, 最低语法规则的数量无法明确区分长模和不记忆的源, 如英文文本和其字符的随机拼写。 这与我们以前的期望相矛盾 。