Compressed bitmap indexes are used in systems such as Git or Oracle to accelerate queries. They represent sets and often support operations such as unions, intersections, differences, and symmetric differences. Several important systems such as Elasticsearch, Apache Spark, Netflix's Atlas, LinkedIn's Pinot, Metamarkets' Druid, Pilosa, Apache Hive, Apache Tez, Microsoft Visual Studio Team Services and Apache Kylin rely on a specific type of compressed bitmap index called Roaring. We present an optimized software library written in C implementing Roaring bitmaps: CRoaring. It benefits from several algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. In particular, we present vectorized algorithms to compute the intersection, union, difference and symmetric difference between arrays. We benchmark the library against a wide range of competitive alternatives, identifying weaknesses and strengths in our software. Our work is available under a liberal open-source license.
翻译:压缩的位图索引用于诸如 Git 或 Oracle 等系统以加速查询。 它们代表各组, 常常支持诸如工会、 交叉点、 差异和对称差异等操作。 一些重要的系统, 如 Elasticsearch、 Apache Spark、 Netflix's Atlas、 LinkedIn's Pinot、 Metmarked' Druid、 Pilosa、 Apache Hive、 Apache Tez、 微软视觉演播团队服务 和 Apache Kylin 等, 都依赖于一种特定类型的压缩位图索引, 叫做 Roaring。 我们展示了一个优化的软件库, 以 C 执行 Roaring Bitmaps: C Roarring 。 它受益于商品处理器上可用的单项指令的几种算法。 我们用自由的开放源许可提供我们的工作 。