闪光过滤器的内流概率概率红心度估计 (In-stream Probabilistic Cardinality Estimation for Bloom Filters) - 专知论文

会员服务 ·

0

Bloom filter · 估计/估计量 · 方差 · 流 · 哈希学习 ·

2022 年 10 月 27 日

In-stream Probabilistic Cardinality Estimation for Bloom Filters

翻译：闪光过滤器的内流概率概率红心度估计

Remy Scholler,Jean-Francois Couchot,Oumaima Alaoui-Ismaili,Denis Renaud,Eric Ballot

from arxiv, 12 pages, 10 figures, 3 tables

The amount of data coming from different sources such as IoT-sensors, social networks, cellular networks, has increased exponentially during the last few years. Probabilistic Data Structures (PDS) are efficient alternatives to deterministic data structures suitable for large data processing and streaming applications. They are mainly used for approximate membership queries, frequency count, cardinality estimation and similarity research. Finding the number of distinct elements in a large dataset or in streaming data is an active research area. In this work, we show that usual methods based on Bloom filters for this kind of cardinality estimation are relatively accurate on average but have a high variance. Therefore, reducing this variance is interesting to obtain accurate statistics. We propose a probabilistic approach to estimate more accurately the cardinality of a Bloom filter based on its parameters, i.e., number of hash functions $k$, size $m$, and a counter $s$ which is incremented whenever an element is not in the filter (i.e., when the result of the membership query for this element is negative). The value of the counter can never be larger than the exact cardinality due to the Bloom filter's nature, but hash collisions can cause it to underestimate it. This creates a counting error that we estimate accurately, in-stream, along with its standard deviation. We also discuss a way to optimize the parameters of a Bloom filter based on its counting error. We evaluate our approach with synthetic data created from an analysis of a real mobility dataset provided by a mobile network operator in the form of displacement matrices computed from mobile phone records. The approach proposed here performs at least as well on average and has a much lower variance (about 6 to 7 times less) than state of the art methods.

翻译：来自不同来源的数据量,如IoT传感器、社交网络、蜂窝网络等,在过去几年中急剧增加。概率数据结构(PDS)是适合于大型数据处理和流式应用程序的确定性数据结构的有效替代物。主要用于大致的会籍查询、频率计数、基底估计和相似性研究。在大型数据集或流数据中查找不同元素的数量是一个活跃的研究领域。在这项工作中,我们显示基于Bloom过滤器过滤器的通常方法,用于这种基本估计的通常方法平均比较准确,但差异很大。因此,降低这种移动性结构是获取准确统计数据的有趣替代物。我们建议采用一种概率方法,更准确地估计Bloom过滤器的基点,即,以其参数为基础,即,即,数以美元为单位,大小为单位,或以美元为单位,在某个要素不在7过滤器中,即,我们创建了这种基底基点的基点,因此,其价值从更小于准确的底值,而其底值则以精确的底值计算结果。

0

相关内容

Bloom filter

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

高Z梯度掺杂靶丸原子浓度三维分布测量原理及方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Bloom filter的下一代互联网可扩展组播技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

地表水文循环参数化对植被冠层动态过程模拟的影响及不确定性分析

国家自然科学基金

0+阅读 · 2012年12月31日

新型STAT3信号通路抑制剂KT53504构效关系和抗肿瘤分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hadoop云存储中基于Ordinal Bloom filter的多维索引关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

中医“#22278;道”#29702;论指导下微重力干预骨髓间充质干细胞多潜能性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

TGF-β28608;活Myocardin家族诱导骨髓间充质干细胞分化的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Optimality Despite Chaos in Fee Markets

Arxiv

0+阅读 · 2022年12月14日

Learning and Predicting Multimodal Vehicle Action Distributions in a Unified Probabilistic Model Without Labels

Arxiv

0+阅读 · 2022年12月14日

Testing the Graph of a Gaussian Graphical Model

Arxiv

0+阅读 · 2022年12月13日

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Arxiv

0+阅读 · 2022年12月11日

Counterfactual Generation Under Confounding

Arxiv

0+阅读 · 2022年12月10日

On Median Filters for Motion by Mean Curvature

Arxiv

0+阅读 · 2022年12月9日

DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

Arxiv

0+阅读 · 2022年12月9日

Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation

Arxiv

1+阅读 · 2022年12月9日

A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data

Arxiv

0+阅读 · 2022年12月8日

Modern Statistical Models and Methods for Estimating Fatigue-Life and Fatigue-Strength Distributions from Experimental Data

Arxiv

0+阅读 · 2022年12月8日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用射频传感器载荷增强无人机的侦察、监视与目标获取（ISR）能力》报告

《导航战》2025最新报告

人工智能驱动的国防战术通信与网络：提升现代战争中的态势感知、安全性与自主决策 | 万字长文

《有人-无人轻型驱逐舰与中型无人水面艇支队在第二与第一岛链作战中的部署概念（CONOPS）》56页报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Optimality Despite Chaos in Fee Markets

Arxiv

0+阅读 · 2022年12月14日

Learning and Predicting Multimodal Vehicle Action Distributions in a Unified Probabilistic Model Without Labels

Arxiv

0+阅读 · 2022年12月14日

Testing the Graph of a Gaussian Graphical Model

Arxiv

0+阅读 · 2022年12月13日

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Arxiv

0+阅读 · 2022年12月11日

Counterfactual Generation Under Confounding

Arxiv

0+阅读 · 2022年12月10日

On Median Filters for Motion by Mean Curvature

Arxiv

0+阅读 · 2022年12月9日

DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

Arxiv

0+阅读 · 2022年12月9日

Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation

Arxiv

1+阅读 · 2022年12月9日

A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data

Arxiv

0+阅读 · 2022年12月8日

Modern Statistical Models and Methods for Estimating Fatigue-Life and Fatigue-Strength Distributions from Experimental Data

Arxiv

0+阅读 · 2022年12月8日

相关基金

高Z梯度掺杂靶丸原子浓度三维分布测量原理及方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Bloom filter的下一代互联网可扩展组播技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

地表水文循环参数化对植被冠层动态过程模拟的影响及不确定性分析

国家自然科学基金

0+阅读 · 2012年12月31日

新型STAT3信号通路抑制剂KT53504构效关系和抗肿瘤分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hadoop云存储中基于Ordinal Bloom filter的多维索引关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

中医“#22278;道”#29702;论指导下微重力干预骨髓间充质干细胞多潜能性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

TGF-β28608;活Myocardin家族诱导骨髓间充质干细胞分化的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员