Modern big data applications integrate data from various sources. As a result, these datasets may not satisfy perfect constraints, leading to sparse schema information and non-optimal query performance. The existing approach of PatchIndexes enable the definition of approximate constraints and improve query performance by exploiting the materialized constraint information. As real world data warehouse workloads are often not limited to read-only queries, we enhance the PatchIndex structure towards an update-conscious design in this paper. Therefore, we present a sharded bitmap as the underlying data structure which offers efficient update operations, and describe approaches to maintain approximate constraints under updates, avoiding index recomputations and full table scans. In our evaluation, we prove that PatchIndexes significantly impact query performance while achieving lightweight update support.
翻译:现代大数据应用程序整合了来自不同来源的数据。 因此,这些数据集可能无法满足完美限制,导致零星的系统化信息和非最佳查询性能。 PatchIndexes 的现有方法能够界定近似限制,并通过利用实际限制信息改进查询性能。 由于真实世界数据仓库的工作量往往不仅限于只读查询,因此我们加强PatchIndex 结构,以在本文中实现更新意识设计。 因此,我们提出了一个支离破碎的位图,作为提供高效更新操作的基本数据结构,并描述在更新中保持约近似限制的方法,避免指数重估和全表扫描。 在我们的评价中,我们证明PatchIndex在获得轻量更新支持的同时,对查询性能产生了重大影响。