建造和分析LSM压缩设计空间 (Constructing and Analyzing the LSM Compaction Design Space)

Log-structured merge (LSM) trees offer efficient ingestion by appending incoming data, and thus, are widely used as the storage layer of production NoSQL data stores. To enable competitive read performance, LSM-trees periodically re-organize data to form a tree with levels of exponentially increasing capacity, through iterative compactions. Compactions fundamentally influence the performance of an LSM-engine in terms of write amplification, write throughput, point and range lookup performance, space amplification, and delete performance. Hence, choosing the appropriate compaction strategy is crucial and, at the same time, hard as the LSM-compaction design space is vast, largely unexplored, and has not been formally defined in the literature. As a result, most LSM-based engines use a fixed compaction strategy, typically hand-picked by an engineer, which decides how and when to compact data. In this paper, we present the design space of LSM-compactions, and evaluate state-of-the-art compaction strategies with respect to key performance metrics. Toward this goal, our first contribution is to introduce a set of four design primitives that can formally define any compaction strategy: (i) the compaction trigger, (ii) the data layout, (iii) the compaction granularity, and (iv) the data movement policy. Together, these primitives can synthesize both existing and completely new compaction strategies. Our second contribution is to experimentally analyze 10 compaction strategies. We present 12 observations and 7 high-level takeaway messages, which show how LSM systems can navigate the compaction design space.

翻译：逻辑结构合并(LSM)树通过附加进货数据提供有效的摄取,因此,被广泛用作生产NOSQL数据存储的储存层。为了能够有竞争力的阅读性能,LSM树定期重组数据以形成能力指数增长的树,通过迭代压缩而形成。压缩从根本上影响LSM工程的性能,表现为书写放大、写输送量、点和范围查看性能、空间放大和删除性能。因此,选择适当的缩压战略至关重要,同时,由于LSM组合设计空间是巨大的、基本上尚未挖掘的,而且文献中也没有正式界定。结果,大多数LSM树的引擎使用固定的缩压战略,通常是由工程师亲手挑选的,决定如何和何时压缩数据。在本文件中,我们展示LSM-Compac-compact(LSM)的设计空间缩缩略图的设计空间,并评价关键性能度测量的状态缩略图战略。为了实现这一目标,我们的第一个缩略图的缩略图是目前的缩缩略图(我们目前的缩缩略图的缩略图的缩略图)战略,我们现有的缩略图的缩略图中的缩缩图可以正式地展示的缩略图(我们现有的缩略图的缩略图)。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日