An effective 'on-the-fly' mechanism for stochastic lossy coding of Markov sources using string matching techniques is proposed in this paper. Earlier work has shown that the rate-distortion bound can be asymptotically achieved by a 'natural type selection' (NTS) mechanism which iteratively encodes asymptotically long source strings (from an unknown source distribution P) and regenerates the codebook according to a maximum likelihood distribution framework, after observing a set of K codewords to 'd-match' (i.e., satisfy the distortion constraint for) a respective set of K source words. This result was later generalized for sources with memory under the assumption that the source words must contain a sequence of asymptotic-length vectors (or super-symbols) over the source super-alphabet, i.e., the source is considered a vector source. However, the earlier result suffers from a significant practical flaw, more specifically, it requires expanding the super-symbols (and correspondingly the super-alphabet) lengths to infinity in order to achieve the rate-distortion bound, even for finite memory sources, e.g., Markov sources. This implies that the complexity of the NTS iteration will explode beyond any practical capabilities, thus compromising the promise of the NTS algorithm in practical scenarios for sources with memory. This work describes a considerably more efficient and tractable mechanism to achieve asymptotically optimal performance given a prescribed memory constraint, within a practical framework tailored to Markov sources. More specifically, the algorithm finds asymptotically the optimal codebook reproduction distribution, within a constrained set of distributions having Markov property with a prescribed order, that achieves the minimum per letter coding rate while maintaining a specified distortion level.
翻译:本文中提议了用于使用字符串匹配技术对 Markov 源进行随机失传编码的有效“ 上天 ” 机制。 早期的工作显示, 率扭曲约束可以通过“ 自然类型选择” (NTS) 机制间接地编码源字符串( 来自未知源分布 P), 并根据最大可能性分布框架重新生成代码簿。 在观察一套 K 代码字串到 d- match (即满足相应 K 源词的扭曲限制) 一组不同的 K 源词之后, 这个结果后来在存储源之前被普遍化, 假设源词必须包含源超级发源的无线- 长矢量( 或超级同义) 序列, 并且根据一个最大可能性分布框架, 在观察一套 K- match ( 更具体地), 要求将超级流流值( 也相应满足了 缩写) 值限制 。 这个结果后来被存储源的内, 将存储源的最小性能( ) 稳定下来。