Universalak 弱变数- Length 不可计数的无限字母法源代码 (Universal Weak Variable-Length Source Coding on Countable Infinite Alphabets)

from arxiv, This article has been accepted for publication by IEEE. Digital Object Identifier 10.1109/TIT.2019.2941895. Link: https://ieeexplore.ieee.org/document/8840879. The material in this paper was partially published in ISIT2016 [1] and ISIT2017 [2], International Symposium on Information Theory (ISIT)

Motivated from the fact that universal source coding on countably infinite alphabets is not feasible, this work introduces the notion of almost lossless source coding. Analog to the weak variable-length source coding problem studied by Han (IEEE TIT, 2000, 46, 1217-1226), almost lossless source coding aims at relaxing the lossless block-wise assumption to allow an average per-letter distortion that vanishes asymptotically as the block-length tends to infinity. In this setup, we show on one hand that Shannon entropy characterizes the minimum achievable rate (similarly to the case of finite alphabet sources) while on the other that almost lossless universal source coding becomes feasible for the family of finite-entropy stationary memoryless sources with infinite alphabets. Furthermore, we study a stronger notion of almost lossless universality that demands uniform convergence of the average per-letter distortion to zero, where we establish a necessary and sufficient condition for the so-called family of envelope distributions to achieve it. Remarkably, this condition is the same necessary and sufficient condition needed for the existence of a strongly minimax (lossless) universal source code for the family of envelope distributions. Finally, we show that an almost lossless coding scheme offers faster rate of convergence for the (minimax) redundancy compared to the well-known information radius developed for the lossless case at the expense of tolerating a non-zero distortion that vanishes to zero as the block-length grows. This shows that even when lossless universality is feasible, an almost lossless scheme can offer different regimes on the rates of convergence of the (worst case) redundancy versus the (worst case) distortion.

翻译：这项工作源于一个事实,即对可观的无限字母的通用源码编码不可行,它引入了几乎无损的源码编码概念。对汉研究的(IEEE TIT, 2000, 46, 1217-1226)的微弱变量源码编码问题的分析,几乎无损源码编码旨在放松无损的全局假设,以便允许平均的单字母扭曲,因为整段长度往往会变得无限地消失。在这个设置中,我们一方面表明香农变形是最低可实现比率(类似于固定字母来源的情况),而另一方面,几乎无损的通用源码编码对于使用无限字母缩略图的固定记忆来源来说是可行的。此外,我们研究一个几乎无损的几乎无损的全局性概念,要求平均的每字母变形变形一致,因为我们为所谓的信封分配体系创造了一个必要和充分的条件。显而易见的是,这个条件与固定字母变形的变形变形的变形模式同样必要和充分的条件,这是存在一个几乎无法变形的缩的缩的缩源码。