The Run Length Encoding (RLE) compression method is a long standing simple lossless compression scheme which is easy to implement and achieves a good compression on input data which contains repeating consecutive symbols. In its pure form RLE is not applicable on natural text or other input data with short sequences of identical symbols. We present a combination of preprocessing steps that turn arbitrary input data in a byte-wise encoding into a bit-string which is highly suitable for RLE compression. The main idea is to first read all most significant bits of the input byte-string, followed by the second most significant bit, and so on. We combine this approach by a dynamic byte remapping as well as a Burrows-Wheeler-Scott transform on a byte level. Finally, we apply a Huffman Encoding on the output of the bit-wise RLE encoding to allow for more dynamic lengths of code words encoding runs of the RLE. With our technique we can achieve a lossless average compression which is better than the standard RLE compression by a factor of 8 on average.
翻译:运行长度编码( RLE) 压缩方法是一个长期的简单且不丢失的压缩方法, 它很容易执行, 并且能够对含有重复连续符号的输入数据进行良好的压缩。 纯格式的 RLE 不适用于自然文本或其他输入数据, 带有相同符号的短顺序。 我们展示了一个组合的预处理步骤, 将任意输入数据在字节编码中转换成一个非常适合 RLE 压缩的位字符串。 主要的想法是首先读取输入字节字符串中最重要的部分, 然后再读第二个最重要的部分等等。 我们通过动态的字节重新绘图以及字节级的 Burrows- Wheeler- Scott 转换, 将这个方法结合起来。 最后, 我们用一个 Huffman Ecoding 来对位式的 RLEE 编码输出进行输入, 以便让 RLE 的代码编码运行有更动态的长度。 我们的技术可以实现一个不损失的平均压缩, 比标准 RLE 压缩的系数平均8 更好 。