Traditional indexing techniques commonly employed in da\-ta\-ba\-se systems perform poorly on multidimensional array scientific data. Bitmap indices are widely used in commercial databases for processing complex queries, due to their effective use of bit-wise operations and space-efficiency. However, bitmap indices apply natively to relational or linearized datasets, which is especially notable in binned or compressed indices. We propose a new method for multidimensional array indexing that overcomes the dimensionality-induced inefficiencies. The hierarchical indexing method is based on $n$-di\-men\-sional sparse trees for dimension partitioning, with bound number of individual, adaptively binned indices for attribute partitioning. This indexing performs well on range involving both dimensions and attributes, as it prunes the search space early, avoids reading entire index data, and does at most a single index traversal. Moreover, the indexing is easily extensible to membership queries. The indexing method was implemented on top of a state of the art bitmap indexing library Fastbit. We show that the hierarchical bitmap index outperforms conventional bitmap indexing built on auxiliary attribute for each dimension. Furthermore, the adaptive binning significantly reduces the amount of bins and therefore memory requirements.
翻译:da\-ta\-ba\-se 系统通常使用的传统指数化技术在多维阵列科学数据中表现不佳。 Bitmap 指数在商业数据库中广泛用于处理复杂查询,因为它们有效使用比特操作和空间效率。然而,位图指数本性适用于关系或线性化数据集,这在硬化指数或压缩指数中特别突出。我们提出了克服维度导致的低效率的多维性矩阵化指数化新方法。等级指数化方法以用于尺寸分区的维度稀树为基础,并附有单个的、适应性硬化指数。这种指数化在涉及两个尺寸和属性的范围之内运作良好,因为它在搜索空间早期使用,避免读取整个索引数据,在多数情况下只使用单一的索引。此外,索引化方法很容易被成员查询。在艺术位图库硬比特索引状态的顶端应用。我们显示,等级位位位图指数化指数在属性和属性分解的大小范围之外,每个特性都很好地表现了常规的硬度,从而降低了常规的硬度的硬度。