A recent line of works apply machine learning techniques to assist or rebuild cost based query optimizers in DBMS. While exhibiting superiority in some benchmarks, their deficiencies, e.g., unstable performance, high training cost, and slow model updating, stem from the inherent hardness of predicting the cost or latency of execution plans using machine learning models. In this paper, we introduce a learning to rank query optimizer, called Lero, which builds on top of the native query optimizer and continuously learns to improve query optimization. The key observation is that the relative order or rank of plans, rather than the exact cost or latency, is sufficient for query optimization. Lero employs a pairwise approach to train a classifier to compare any two plans and tell which one is better. Such a binary classification task is much easier than the regression task to predict the cost or latency, in terms of model efficiency and effectiveness. Rather than building a learned optimizer from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native optimizer. With its non intrusive design, Lero can be implemented on top of any existing DBMS with minimum integration efforts. We implement Lero and demonstrate its outstanding performance using PostgreSQL. In our experiments, Lero achieves near optimal performance on several benchmarks. It reduces the execution time of the native PostgreSQL optimizer by up to 70% and other learned query optimizers by up to 37%. Meanwhile, Lero continuously learns and automatically adapts to query workloads and changes in data.
翻译:最近一行工程应用机器学习技术来协助或重建DBMS中基于成本的查询优化。 虽然在某些基准中表现出优势,但其缺陷,例如业绩不稳定、培训成本高、以及模型更新缓慢等,源于使用机器学习模型预测执行计划的成本或延迟度的内在难度。在本论文中,我们引入了将查询优化(称为Lero)排序为“优化”的学习方法,该方法建立在本地查询优化器之上,并不断学习如何改进查询优化。关键观察是,计划的相对顺序或级别,而不是准确的成本或延迟度,足以优化查询。Lero采用对口方法培训一个分类员,以比较任何两个计划,并告诉哪个计划更好。从模型效率和有效性方面来说,这种二进式分类任务比回归任务更容易预测成本或延迟度。与其从零开始建立学习的优化相比,Lero旨在利用几十年的优化数据库的智慧,改进本地优化数据。随着其非侵入性设计,Lero可以在任何现有的DBMS中进行顶端的升级,并用最杰出的整合努力来实施。