Gaussian-Process-based Bayesian optimization (GP-BO), is a prevailing model-based framework for DBMS auto-tuning. However, recent work shows GP-BO-based DBMS auto-tuners significantly outperformed auto-tuners based on SMAC, which features random forest surrogate models; such results motivate us to rethink and investigate the limitations of GP-BO in auto-tuner design. We find the fundamental assumptions of GP-BO are widely violated when modeling and optimizing DBMS performance, while tree-ensemble-BOs (e.g., SMAC) can avoid the assumption pitfalls and deliver improved tuning efficiency and effectiveness. Moreover, we argue that existing tree-ensemble-BOs restrict further advancement in DBMS auto-tuning. First, existing tree-ensemble-BOs can only achieve distribution-free point estimates, but still impose unrealistic distributional assumptions on uncertainty estimates, compromising surrogate modeling and distort the acquisition function. Second, recent advances in gradient boosting, which can further enhance surrogate modeling against vanilla GP and random forest counterparts, have rarely been applied in optimizing DBMS auto-tuners. To address these issues, we propose a novel model-based DBMS auto-tuner, Centrum. Centrum improves distribution-free point and interval estimation in surrogate modeling with a two-phase learning procedure of stochastic gradient boosting ensembles. Moreover, Centrum adopts a generalized SGBE-estimated locally-adaptive conformal prediction to facilitate a distribution-free uncertainty estimation and acquisition function. To our knowledge, Centrum is the first auto-tuner to realize distribution-freeness, enhancing BO's practicality in DBMS auto-tuning, and the first to seamlessly fuse gradient boosting ensembles and conformal inference in BO. Extensive physical and simulation experiments on two DBMSs and three workloads show Centrum outperforms 21 SOTA methods.
翻译:暂无翻译