In this paper, we study the problem of designing prefix-free encoding schemes having minimum average code length that can be decoded efficiently under a decode cost model that captures memory hierarchy induced cost functions. We also study a special case of this problem that is closely related to the length limited Huffman coding (LLHC) problem; we call this the {\em soft-length limited Huffman coding} problem. In this version, there is a penalty associated with each of the $n$ characters of the alphabet whose encodings exceed a specified bound $D$($\leq n$), where the penalty increases linearly with the length of the encoding beyond $D$. The goal of the problem is to find a prefix-free encoding having minimum average code length and total penalty within a pre-specified bound ${\cal P}$. This generalizes the LLHC problem. We present an algorithm to solve this problem that runs in time $O( nD )$. We study a further generalization in which the penalty function and the objective function can both be arbitrary monotonically non-decreasing functions of the codeword length. We provide dynamic programming based exact and PTAS algorithms for this setting.
翻译:在本文中,我们研究了设计最低平均代码长度的无前缀编码计划的问题,这些编码计划可以按照一个包含内存等级引致成本功能的解码成本模型有效解码。我们还研究了这一问题的一个特殊案例,它与Huffman编码(LLHC)的长度有限问题密切相关;我们称之为“软长限制Huffman编码”问题。在这个版本中,编码超过特定约束值的字母数超过1美元($leq n$)的字母中的每个美元字符都会受到处罚,因为编码长度随着编码长度的长度超过$D而线性地增加。问题的目的是在预先指定的美元范围内找到一个具有最低平均代码长度和总罚款的无前缀编码。我们提出了一个用于解决这一问题的算法,它以时间为$(nD)美元。我们研究进一步的一般化,即惩罚功能和目标功能可以是任意的单项非决定的。我们提供了基于该代码长度的动态算法的动态算法。