Log-structured merge-trees (LSM trees) are increasingly used as the storage engines behind several data systems, many of which are deployed in the cloud. Similar to other database architectures, LSM trees take into account information about the expected workloads (e.g., reads vs. writes and point vs. range queries) and optimize their performances by changing tunings. Operating in the cloud, however, comes with a degree of uncertainty due to multi-tenancy and the fast-evolving nature of modern applications. Databases with static tunings discount the variability of such hybrid workloads and hence provide an inconsistent and overall suboptimal performance. To address this problem, we introduce ENDURE -- a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policies, size-ratio, and memory allocation on the overall query performance. ENDURE considers a robust formulation of the throughput maximization problem, and recommends a tuning that maximizes the worst-case throughput over the neighborhood of an expected workload. Additionally, an uncertainty tuning parameter controls the size of this neighborhood, thereby allowing the output tunings to be conservative or optimistic. We benchmark ENDURE on a state-of-the-art LSM-based storage engine, RocksDB, and show that its tunings comprehensively outperform tunings from classical strategies. Drawing upon the results of our extensive analytical and empirical evaluation, we recommend the use of ENDURE for optimizing the performance of LSM tree-based storage engines.
翻译:与其它数据库结构类似,LSM树也考虑到预期工作量的信息(例如,阅读书写和点对范围查询),并通过改变调试优化其性能。然而,在云层操作时,由于多种强度和现代应用的快速发展性质而具有一定程度的不确定性。静态调试数据库可以降低这种混合工作量的变异性,从而提供一种不一致和总体的亚优性业绩。为了解决这一问题,我们引入ENDURRE,这是在工作量不确定的情况下调整LSM树的新模式。具体地说,我们侧重于压缩政策的选择、规模拉比和总体调试性能的记忆分配的影响。ENDURE认为,通过量最大化问题是一种强有力的配方,并建议进行调适量调整,以最大限度地增加这种混合工作量的可变性,从而提供广泛的分析性能。此外,我们引入了“ENDURRE”的新模式, 调整了对LDRM 的稳度排序, 调整了这个区域的质量要求。