Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such as unknown valid ranges of configuration parameters and combinations of parameter values that result in invalid runs, and how we mitigated them. While these issues are typically overlooked, we argue that they are a crucial barrier to the adoption of ML self-tuning techniques in databases, and thus deserve more attention from the research community. Secondly, we present experimental results obtained when tuning FoundationDB using ML methods. Unlike prior work in this domain, we also compare with the simplest of baselines: random search. Our results show that, while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline, finding a configuration that is only 4% worse than the, vastly more complex, ML methods. We conclude that future work in this area may want to focus more on randomized, model-free optimization algorithms.
翻译:机器学习(ML)方法最近已成为对数据库进行自动参数调整的有效方法。 最新工艺方法包括巴伊西亚优化( BO) 和强化学习( RL) 。 在这项工作中,我们描述了我们将这些方法应用到尚未研究的数据库的经验: FundDB。 首先,我们描述了我们面临的挑战,例如配置参数的未知有效范围以及导致无效运行的参数值的组合,以及我们如何减轻这些挑战。 虽然这些问题通常被忽视,但我们认为这些问题是数据库采用ML自调技术的关键障碍,因此值得研究界更多关注。 其次,我们介绍了在使用 ML 方法调控基础数据库时获得的实验结果。 与以前在这一领域的工作不同, 我们还比较了最简单的基线: 随机搜索。 我们的结果显示, 虽然BO 和 RL 方法可以提高FundDB的传输量, 高达38%, 随机搜索是一个高度竞争性的基线, 找到的配置只有4%比远为复杂得多的ML 方法更差。 我们的结论是, 未来在这一领域的工作可能需要更侧重于随机的算法。