Exploiting the relationships among data, such as primary and foreign keys, is a classical query optimization technique. As persistent data is increasingly being created and maintained programmatically (e.g., web applications), prior work that focuses on inferring data relationships by tabulating statistics from the stored data misses an important opportunity. We present ConstrOpt, the first tool that identifies data relationships by analyzing the programs that generate and maintain the persistent data. Once identified, ConstrOpt leverages the found constraints to optimize the application's physical design and query execution by rewriting queries. Instead of developing a fixed set of predefined rewriting rules, ConstrOpt employs an enumerate-test-verify technique to automatically exploit the discovered data constraints to improve query execution. Each resulting rewrite is provably semantically equivalent to the original query. Using 14 real-world web applications, our experiments show that ConstrOpt can discover over 4306 data constraints by analyzing application source code. On 3 of the evaluated applications, among queries with at least one constrained column, 42% can benefit from data layout optimization, and 35% are optimized by changing the application code. Finally, ConstrOpt's constraint-driven optimizer improves the performance of 826 queries, 9.8% of which has over 2x speedup.
翻译:利用原始和外国密钥等数据之间的关系,是一种古典的查询优化技术。随着持续数据越来越多地在程序上创建和维护(例如网络应用程序),通过从存储的数据中制成图表来分析数据关系的先前工作遗漏了一个重要机会。我们展示了 Constropt,这是第一个通过分析生成和维护持久性数据的程序来确定数据关系的工具。一旦发现, Constropt 利用所发现的制约因素,通过重写查询优化应用程序的物理设计和查询执行。除了开发一套固定的预先定义的重写规则外, Constropt 使用一种点数测试核查技术,自动利用所发现的数据限制来改进查询执行。每个生成的重写都与原始查询相仿。我们用14个真实的网络应用程序来显示 Constropt能够通过分析应用源码发现4306个以上的数据限制。在至少一个受限制的栏中,42%的应用程序可以受益于数据布局的优化,35%通过改变应用速度的代码优化了数据格式。