前缀过滤器: 实际和理论上比布罗美好 (Prefix Filter: Practically and Theoretically Better Than Bloom) - 专知论文

会员服务 ·

0

FAST · Analysis · Better · 向量化 · 假正例率 ·

2022 年 10 月 25 日

Prefix Filter: Practically and Theoretically Better Than Bloom

翻译：前缀过滤器: 实际和理论上比布罗美好

Tomer Even,Guy Even,Adam Morrison

from arxiv, Full version of VLDB'22 paper

Many applications of approximate membership query data structures, or filters, require only an incremental filter that supports insertions but not deletions. However, the design space of incremental filters is missing a "sweet spot" filter that combines space efficiency, fast queries, and fast insertions. Incremental filters, such as the Bloom and blocked Bloom filter, are not space efficient. Dynamic filters (i.e., supporting deletions), such as the cuckoo or vector quotient filter, are space efficient but do not exhibit consistently fast insertions and queries. In this paper, we propose the prefix filter, an incremental filter that addresses the above challenge: (1) its space (in bits) is similar to state-of-the-art dynamic filters; (2) query throughput is high and is comparable to that of the cuckoo filter; and (3) insert throughput is high with overall build times faster than those of the vector quotient filter and cuckoo filter by $1.39\times$-$1.46\times$ and $3.2\times$-$3.5\times$, respectively. We present a rigorous analysis of the prefix filter that holds also for practical set sizes (i.e., $n=2^{25}$). The analysis deals with the probability of failure, false positive rate, and probability that an operation requires accessing more than a single cache line.

翻译：近似会籍查询数据结构或过滤器的许多应用只需要一个支持插入而不是删除的递增过滤器。但是, 递增过滤器的设计空间缺少一个“ 甜点” 过滤器, 将空间效率、快速查询和快速插入结合起来。递增过滤器, 如Bloom 和屏蔽 Bloom 过滤器等递增过滤器, 没有空间效率。动态过滤器( 支持删除), 如 cuckoo 或矢量过滤器等, 空间效率很高, 但没有一贯快速插入和查询。在本文中, 我们提议使用前缀过滤器, 是一个应对上述挑战的递增过滤器:(1) 其空间( 位数) 类似于最先进的动态过滤器; (2) 查询管道高, 与 cuckoo 过滤器相似; (3) 插入通量比矢量过滤器和库过滤器总体建设速度快1.39\ time, $1. 46\time, 和 3.2\time $ 递增 33.5\ time 美元。我们使用一个精确度的精确度分析, 和精确度的精确度, 。

0

相关内容

FAST

FAST：Conference on File and Storage Technologies。 Explanation：文件和存储技术会议。 Publisher：USENIX。 SIT:http://dblp.uni-trier.de/db/conf/fast/

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

靶向免疫治疗与靶向化疗新制剂的抗肿瘤协同作用

国家自然科学基金

0+阅读 · 2014年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

吡咯并吡咯烷酮类染料分子设计、吸附动力学及敏化太阳电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

新型Ca2+荧光探针的设计合成及在生物细胞中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

随机扰动下非线性动力系统的不确定行为及扰动敏感度的数值实验和分析

国家自然科学基金

0+阅读 · 2009年12月31日

生产与服务系统中流程柔性结构和设计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤细胞EGFR靶向的双功能免疫纳米胶束用于肿瘤MRI检测及药物治疗的研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Learning and Control Perspective for Microfinance

Arxiv

0+阅读 · 2022年12月12日

Retire: Robust Expectile Regression in High Dimensions

Arxiv

0+阅读 · 2022年12月11日

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization

Arxiv

0+阅读 · 2022年12月9日

On Median Filters for Motion by Mean Curvature

Arxiv

0+阅读 · 2022年12月9日

Structure-preserving numerical method for Maxwell-Ampère Nernst-Planck model

Arxiv

0+阅读 · 2022年12月9日

Regularized ERM on random subspaces

Arxiv

0+阅读 · 2022年12月8日

Power and Sample Size Calculations for Rerandomization

Arxiv

0+阅读 · 2022年12月8日

Robust Active Visual Perching with Quadrotors on Inclined Surfaces

Arxiv

0+阅读 · 2022年12月7日

Fast and Practical DAG Decomposition with Reachability Applications

Arxiv

0+阅读 · 2022年12月7日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《解析陆域作战方向：一个概念性框架》报告

《人工智能与人类的未来》2025年最新300页书籍

追寻真正的AI自主性：从遗留思维到战场优势

《“蛛网”行动：乌克兰不对称作战的演进》报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Learning and Control Perspective for Microfinance

Arxiv

0+阅读 · 2022年12月12日

Retire: Robust Expectile Regression in High Dimensions

Arxiv

0+阅读 · 2022年12月11日

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization

Arxiv

0+阅读 · 2022年12月9日

On Median Filters for Motion by Mean Curvature

Arxiv

0+阅读 · 2022年12月9日

Structure-preserving numerical method for Maxwell-Ampère Nernst-Planck model

Arxiv

0+阅读 · 2022年12月9日

Regularized ERM on random subspaces

Arxiv

0+阅读 · 2022年12月8日

Power and Sample Size Calculations for Rerandomization

Arxiv

0+阅读 · 2022年12月8日

Robust Active Visual Perching with Quadrotors on Inclined Surfaces

Arxiv

0+阅读 · 2022年12月7日

Fast and Practical DAG Decomposition with Reachability Applications

Arxiv

0+阅读 · 2022年12月7日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

靶向免疫治疗与靶向化疗新制剂的抗肿瘤协同作用

国家自然科学基金

0+阅读 · 2014年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

吡咯并吡咯烷酮类染料分子设计、吸附动力学及敏化太阳电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

新型Ca2+荧光探针的设计合成及在生物细胞中的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

随机扰动下非线性动力系统的不确定行为及扰动敏感度的数值实验和分析

国家自然科学基金

0+阅读 · 2009年12月31日

生产与服务系统中流程柔性结构和设计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

肿瘤细胞EGFR靶向的双功能免疫纳米胶束用于肿瘤MRI检测及药物治疗的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员