RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versatility comes at the cost of complex tuning configurations. This paper investigates maximizing the throughput of RocksDB IO operations by auto-tuning ten parameters of varying ranges. Off-the-shelf optimizers struggle with high-dimensional problem spaces and require a large number of training samples. We propose two techniques to tackle this problem: multi-task modeling and dimensionality reduction through clustering. By incorporating adjacent optimization in the model, the model converged faster and found complicated settings that other tuners could not find. This approach had an additional computational complexity overhead, which we mitigated by manually assigning parameters to each sub-goal through our knowledge of RocksDB. The model is then incorporated in a standard Bayesian Optimization loop to find parameters that maximize RocksDB's IO throughput. Our method achieved x1.3 improvement when benchmarked against a simulation of Facebook's social graph traffic, and converged in ten optimization steps compared to other state-of-the-art methods that required fifty steps.
翻译:RocksDB 是一个用于多种不同环境的通用嵌入式关键值存储器。 它的多功能性是以复杂的调制配置为代价的。 本文通过自动调控十种不同范围的参数来调查最大程度的 RocksDB IO 运行量。 现成的优化器与高维度问题空间进行斗争, 需要大量培训样本。 我们提出了解决这一问题的两种方法: 多任务建模和通过集束减少维度。 通过将相邻的优化纳入模型, 模型会更快地聚合, 发现其他调制者无法找到的复杂设置 。 这个方法具有额外的计算复杂性间接费用, 我们通过对 RocksDB 的了解, 手动为每个子目标分配参数, 从而减轻了这种复杂性。 该模型随后被纳入一个标准的Bayesian Oppimizmization环圈, 以找到能最大限度地实现 RocksDB IO 吞吐量的参数。 我们的方法在以模拟Facebook的社会图形流量为基准时实现了x1.3 改进。 方法在10 优化步骤中得到了一致, 与其他需要 五十 步骤的状态方法 。