Database systems analyze queries to determine upfront which data is needed for answering them and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps. Our approach significantly improves performance for both disk-based and main-memory database systems.
翻译:数据库系统分析查询,以确定答复数据需要哪些数据,并使用索引和其他物理设计技术加快访问数据的速度。但是,对于重要的查询类别,例如HAVING和Sptok查询,不可能确定数据的相关性。为了克服这一限制,我们开发出源数据跳转(PBDS),这是一种新颖的方法,生成出处草图,以简明地编码查询相关数据。一旦采集出处草图,将用来加快随后的查询。PBDS可以利用索引和区图等物理设计文物。我们的方法大大改进了磁盘和主模数据库系统的性能。