数据库工作负载与 Query Plan 编码器的字符化 (Database Workload Characterization with Query Plan Encoders)

Smart databases are adopting artificial intelligence (AI) technologies to achieve {\em instance optimality}, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization. To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the {\em structural} and the {\em computational performance} of queries independently. We show that our pretrained encoders are adaptable to workloads that expedite the transfer learning process. We performed independent assessments of structural encoder and performance encoders with multiple downstream tasks. For the overall evaluation of our query plan encoders, we architect two downstream tasks (i) query latency prediction and (ii) query classification. These tasks show the importance of feature-based workload characterization. We also performed extensive experiments on individual encoders to verify the effectiveness of representation learning and domain adaptability.

翻译：智能数据库正在采用人工智能(AI)技术,以实现最大性能;今后,数据库将在其核心组成部分中采用预先包装的AI模型。原因是,每个数据库都使用不同的工作量,需要特定的资源和最佳性能的环境。这促使有必要全面理解系统中的工作量及其特征,我们称之为工作量的特征特征特征。为解决工作量定性问题,我们建议我们的查询计划编码器,从查询计划中学习基本特征和相关性。我们预先培训的编码器独立地捕捉到查询的`em结构}和{em计算性能。我们显示,我们预先培训的编码器适应了加快转移学习过程的工作量。我们进行了结构编码器和性能编码器的独立评估,并进行了多项下游任务。为了全面评价我们的查询计划编码器,我们设计了两项下游任务(一) 查询纬度预测和(二) 查询分类。这些任务显示了基于特性的工作量定性的重要性。我们还对单个编码器进行了广泛的实验,以核实代表性和学习适应性域的有效性。