We formalize the tail redundancy of a collection of distributions over a countably infinite alphabet, and show that this fundamental quantity characterizes the asymptotic per-symbol redundancy of universally compressing sequences generated iid from a collection $\mathcal P$ of distributions over a countably infinite alphabet. Contrary to the worst case formulations of universal compression, finite single letter (average case) redundancy of $\mathcal P$ does not automatically imply that the expected redundancy of describing length-$n$ strings sampled iid from $\mathcal P$ grows sublinearly with $n$. Instead, we prove that universal compression of length-$n$ \iid sequences from $\mathcal P$ is characterized by how well the tails of distributions in $\mathcal P$ can be universally described, showing that the asymptotic per-symbol redundancy of iid strings is equal to the tail redundancy.
翻译:我们正式确定在可计算到无限字母上分发的集合尾部冗余,并表明这一基本数量是从一个可计算到无限字母上分发的集合$\mathcal P$中产生的普遍压缩序列的无症状单体冗余。 与通用压缩最差的情况配方相反, 限定单字母(平均)冗余$\mathcal P$并不自动意味着从$\mathcal P$中描述长度- $n 抽取的字符串 Iid 的预期冗余。 相反,我们证明从$\mathcal P$中普遍压缩长度- $n\ iid 序列的特性是能够普遍描述到$\mathcal P$的尾尾部,这表明iid字符的无症状/symbol冗余等于尾部冗余。