In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a primary LSM-tree index; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focus on efficient data ingestion and query processing for general-purpose LSM-based storage systems. We first propose and evaluate a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes. We then present several new and efficient maintenance strategies for LSM-based storage systems. Finally, we have implemented and experimentally evaluated the proposed techniques in the context of the Apache AsterixDB system, and we present the results here.
翻译:近年来,NOSQL和NewSQL系统广泛采用逻辑结构合并(LSM)树,以取得优异的写作性能,尽管受到欢迎,但大多数现有工作侧重于基于LSM的钥匙价值商店,只有一级LSM-树指数;辅助结构对于支持临时性查询至关重要,但得到的关注要少得多;在本文件中,我们侧重于通用LSM储存系统的高效数据摄入和查询处理;我们首先提出和评估一系列高效分批点检查的优化,大大改进基于LSM的二级指数的应用范围;我们随后为基于LSM的储存系统提出若干新的高效维护战略;最后,我们在Appach AsterixDB系统中实施并实验性地评价了拟议的技术,我们在此介绍结果。