In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.
翻译:在许多重要应用中,例如搜索引擎和关联数据库系统,数据以整数阵列的形式存储。编码,最重要的是,这些阵列解码耗用了相当长的CPU时间。因此,为降低压缩和降压的相关成本做出了巨大努力。特别是,研究人员利用了现代处理器和SIMD指示的超卡路里性质。然而,我们引入了一个名为SIMD-BP128的新矢量化计划,它比先前提议的矢量化方法有所改进。它的速率是桌面处理器(Varint-G8IU和PFOR)上以前最快的方案(Varint-G8IU和PFOR)的两倍。同时,SIMD-BP128为每个整数节省了2位。为了更好地压缩,我们提议了另一个新的矢量化计划(SIMD-FastPFOR),其压缩率在州级计划(Soint-8b)的10%之内,而在解码过程中速度为2倍。