Recently, the database management system (DBMS) community has witnessed the power of machine learning (ML) solutions for DBMS tasks. Despite their promising performance, these existing solutions can hardly be considered satisfactory. First, these ML-based methods in DBMS are not effective enough because they are optimized on each specific task, and cannot explore or understand the intrinsic connections between tasks. Second, the training process has serious limitations that hinder their practicality, because they need to retrain the entire model from scratch for a new DB. Moreover, for each retraining, they require an excessive amount of training data, which is very expensive to acquire and unavailable for a new DB. We propose to explore the transferabilities of the ML methods both across tasks and across DBs to tackle these fundamental drawbacks. In this paper, we propose a unified model MTMLF that uses a multi-task training procedure to capture the transferable knowledge across tasks and a pre-train fine-tune procedure to distill the transferable meta knowledge across DBs. We believe this paradigm is more suitable for cloud DB service, and has the potential to revolutionize the way how ML is used in DBMS. Furthermore, to demonstrate the predicting power and viability of MTMLF, we provide a concrete and very promising case study on query optimization tasks. Last but not least, we discuss several concrete research opportunities along this line of work.
翻译:最近,数据库管理系统(DBMS)社区目睹了用于DBMS任务的机器学习(ML)解决方案的力量。尽管这些现有解决方案表现良好,但很难被认为是令人满意的。首先,DBMS中这些基于ML的方法不够有效,因为它们在每项具体任务上都是最佳的,无法探索或理解任务之间的内在联系。第二,培训过程存在严重限制,有碍其实际性,因为它们需要从头到尾重新培训整个模式,以建立一个新的DB。此外,对于每次再培训,它们都需要大量的培训数据,而对于新的DB来说,这些数据非常昂贵,难以获得和获得。我们提议探索ML方法在任务之间和跨DB中的脆弱性,以解决这些根本的缺陷。在本文件中,我们提出了一个统一的模型MTMLF培训程序,利用多任务培训程序获取跨任务之间的可转让知识,以及一个前的微调程序,以在整个DBBS中提取可转让的元知识。我们认为,这种模式更适合云化 DB服务,而且对于新的DB服务来说是非常昂贵的。我们提议探索ML方法的转让能力,但MLL是如何在具体研究中最有可能进行。