Huffman coding is well known to be useful in certain decision problems involving minimizing the average number of (freely chosen) queries to determine an unknown random variable. However, in problems where the queries are more constrained, the original Huffman coding no longer works. In this paper, we proposed a general model to describe such problems and two code schemes: one is Huffman-based, and the other called GBSC (Greedy Binary Separation Coding). We proved the optimality of GBSC by induction on a binary decision tree, telling us that GBSC is at least as good as Shannon coding. We then compared the two algorithms based on these two codes, by testing them with two problems: DNA detection and 1-player Battleship, and found both to be decent approximating algorithms, with Huffman-based algorithm giving an expected length 1.1 times the true optimal in DNA detection problem, and GBSC yielding an average number of queries 1.4 times the theoretical optimal in 1-player Battleship.
翻译:众所周知,Huffman编码在某些决策问题上非常有用,这些问题涉及尽量减少(自由选择的)查询的平均数量,以确定一个未知随机变量。然而,在查询比较受限的问题中,最初的Huffman编码不再起作用。在本文中,我们提出了一个描述这类问题的一般模式和两个代码方案:一个是以Huffman为基础的,另一个称为GBSC(Greedy Binary隔离编码)。我们通过在二进制决策树上介绍GBSC,证明GBSC是最佳的,告诉我们GBSC至少和香农编码一样好。然后,我们将基于这两个代码的两种算法进行比较,用两个问题来测试:DNA探测和一玩家战舰,发现这两种算法都是相当相似的,而基于Huffman的算法的预期长度为DNA探测问题真正最佳的1.1倍,而GBSC的查询平均次数是1玩家战役理论最佳的1.4倍。