We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.
翻译:我们引入了普罗特斯(Proteus),这是一个全新的自我设计近似范围过滤器,它基于抽样数据进行自我配置,以优化对特定空间要求的假正率(FPR ) 。 普罗特斯(Proteus)统一了最先进的范围过滤器的概率和确定性设计空间,以便在更广泛的各种使用案例中实现强效。 普罗特斯(Proteus)的核心是我们的“背景前端过滤器”(CPFPR)模型——一个正式的框架,用于其设计空间的基于前端过滤器的FPR。 我们从经验上展示了我们模型和普罗特斯(Proteus)在合成工作量和真实世界数据集方面优化的准确性。 我们进一步评估了罗克斯DB的Proteus(Proteus),并表明它能够通过5.3x(5.3x)来改善端端到端端端端的性功能,比如SuRF和罗塞塔(Rosetta)。 我们的实验还表明,建模成本与端端端端端到端业绩收益相比并不大,普罗特斯(Proteus)也能够工作量变化。