Daisy Bloom 过滤器 (Daisy Bloom Filters) - 专知论文

会员服务 ·

0

假阳性 · Weight · 假正例率 · CASE · 假正例 ·

2022 年 5 月 30 日

Daisy Bloom Filters

翻译：Daisy Bloom 过滤器

Ioana O. Bercea,Jakob Bæk Tejs Houen,Rasmus Pagh

from arxiv, 16 pages, 1 figure

Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) are Bloom filters that adapt the number of hash functions according to the query element. That is, they use a sequence of hash functions $h_1, h_2, \dots$ and insert $x$ by setting the bits in $k_x$ positions $h_1(x), h_2(x), \dots, h_{k_x}(x)$ to 1, where the parameter $k_x$ depends on $x$. Similarly, a query for $x$ checks whether the bits at positions $h_1(x), h_2(x), \dots, h_{k_x}(x)$ contain a $0$ (in which case we know that $x$ was not inserted), or contains only $1$s (in which case $x$ may have been inserted, but it could also be a false positive). In this paper, we determine a near-optimal choice of the parameters $k_x$ in a model where $n$ elements are inserted independently from a probability distribution $\mathcal{P}$ and query elements are chosen from a probability distribution $\mathcal{Q}$, under a bound on the false positive probability $F$. In contrast, the parameter choice of Bruck et al., as well as follow-up work by Wang et al., does not guarantee a nontrivial bound on the false positive rate. We refer to our parameterization of the weighted Bloom filter as a $\textit{Daisy Bloom filter}$. For many distributions $\mathcal{P}$ and $\mathcal{Q}$, the Daisy Bloom filter space usage is significantly smaller than that of Standard Bloom filters. Our upper bound is complemented with an information-theoretical lower bound, showing that (with mild restrictions on the distributions $\mathcal{P}$ and $\mathcal{Q}$), the space usage of Daisy Bloom filters is the best possible up to a constant factor. Daisy Bloom filters can be seen as a fine-grained variant of a recent data structure of Vaidya, Knorr, Mitzenmacher and Kraska. Like their work, we are motivated by settings in which we have prior knowledge of the workload of the filter, possibly in the form of advice from a machine learning algorithm.

翻译：(bruck, Gao和Jiang, ISIT, 2006) 是Bloom 过滤器, 可以根据查询元素调整散列函数的数量。也就是说, 它们使用h_ 1, h_ 2,\dots 美元, 插入美元x美元, 将位数设置在 $_x, h_x, h_x, \dots 美元到 1 美元, 其中参数为美元xx 。同样, 查询 $x 美元是否根据查询值调整散列函数数量。也就是说, 它们使用h_ 1, h_ 2, 美元, 美元, 并插入美元, 插入美元xx, 插入美元xxx, 插入美元xxxx 美元, 插入美元xxx 美元, 以1美元为美元, 以美元, 以更低的空域值为准。在本文中, 我们确定在模型中选择美元xal_xal, 美元, 以美元为正值, 以正值美元美元。

0

相关内容

假阳性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

带自相容源的孤子方程新类型的精确解及其动力学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

图与单纯复形的EKR型交性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

nAChRs突变介导的西花蓟马对多杀菌素抗性机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Bloom filter的下一代互联网可扩展组播技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hadoop云存储中基于Ordinal Bloom filter的多维索引关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DSP的LDoS/LDDoS攻击建模、检测和过滤方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多谱NaY(Gd)F4:Yb,Er(Tm)纳米粒子的界面修饰、性质及农药多残留免疫分析方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于细胞凋亡抑制途径的酵母耐铝性及其胞内钙信号调控分子机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

A self-censoring model for multivariate nonignorable nonmonotone missing data

Arxiv

0+阅读 · 2022年7月18日

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

Arxiv

0+阅读 · 2022年7月17日

Approximating Pandora's Box with Correlations

Arxiv

0+阅读 · 2022年7月17日

Robust Voting Rules from Algorithmic Robust Statistics

Arxiv

0+阅读 · 2022年7月17日

Almost Polynomial Factor Inapproximability for Parameterized k-Clique

Arxiv

0+阅读 · 2022年7月16日

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Arxiv

0+阅读 · 2022年7月15日

Hausdorff Distance between Norm Balls and their Linear Maps

Hausdorff Distance between Norm Balls and their Linear Maps

Arxiv

0+阅读 · 2022年7月15日

Smooth Lasso Estimator for the Function-on-Function Linear Regression Model

Arxiv

0+阅读 · 2022年7月14日

Generalized Wake-Up: Amortized Shared Memory Lower Bounds for Linearizable Data Structures

Arxiv

0+阅读 · 2022年7月12日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

A self-censoring model for multivariate nonignorable nonmonotone missing data

Arxiv

0+阅读 · 2022年7月18日

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

Arxiv

0+阅读 · 2022年7月17日

Approximating Pandora's Box with Correlations

Arxiv

0+阅读 · 2022年7月17日

Robust Voting Rules from Algorithmic Robust Statistics

Arxiv

0+阅读 · 2022年7月17日

Almost Polynomial Factor Inapproximability for Parameterized k-Clique

Arxiv

0+阅读 · 2022年7月16日

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Arxiv

0+阅读 · 2022年7月15日

Hausdorff Distance between Norm Balls and their Linear Maps

Hausdorff Distance between Norm Balls and their Linear Maps

Arxiv

0+阅读 · 2022年7月15日

Smooth Lasso Estimator for the Function-on-Function Linear Regression Model

Arxiv

0+阅读 · 2022年7月14日

Generalized Wake-Up: Amortized Shared Memory Lower Bounds for Linearizable Data Structures

Arxiv

0+阅读 · 2022年7月12日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

带自相容源的孤子方程新类型的精确解及其动力学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

图与单纯复形的EKR型交性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

nAChRs突变介导的西花蓟马对多杀菌素抗性机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Bloom filter的下一代互联网可扩展组播技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hadoop云存储中基于Ordinal Bloom filter的多维索引关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DSP的LDoS/LDDoS攻击建模、检测和过滤方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多谱NaY(Gd)F4:Yb,Er(Tm)纳米粒子的界面修饰、性质及农药多残留免疫分析方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于细胞凋亡抑制途径的酵母耐铝性及其胞内钙信号调控分子机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员