用 Taffy 过滤器拉伸数据 (Stretching Your Data With Taffy Filters) - 专知论文

会员服务 ·

0

假阳性 · 哈希学习 · 总回报 · 近似 · Networking ·

2021 年 9 月 4 日

Stretching Your Data With Taffy Filters

翻译：用 Taffy 过滤器拉伸数据

from arxiv, 11 pages, 13 figures

Popular approximate membership query structures such as Bloom filters and cuckoo filters are widely used in databases, security, and networking. These structures support two operations -- insert and lookup; lookup always returns true on elements inserted into the structure, while it returns true with some probability $\varepsilon \ll 1$ on elements not inserted into the structure. These latter elements are called false positives. Compensatory for these false positives, filters can be much smaller than hash tables that represent the same set. However, unlike hash tables, cuckoo filters and Bloom filters must be initialized with the intended number of inserts to be performed, and cannot grow larger -- inserts beyond this number may fail or increase the false positive probability. This paper presents designs and implementations of filters than can grow without inserts failing and without increasing the false positive probability, even if the filters are created with a small initial size. The resulting code is open source and available for general use.

翻译：Bloom 过滤器和 cuckoo 过滤器等广受欢迎的成员查询结构在数据库、安全和网络中广泛使用。这些结构支持两种操作 -- -- 插入和查看;查找总是对插入结构的元素真实返回,而对于未插入结构的元素则返回真实,但可能性为$\varepsilon\ll 1美元。这些元素被称为假正数。对这些假正数的补偿,过滤器可能比代表同一组的散列表要小得多。然而,与散列表格不同,必须先用预想的插入数来初始化 cuckoo 过滤器和 Bloom 过滤器,且不能扩大 -- -- 超过此数的插入可能失败或增加错误正概率。本文显示过滤器的设计和实施比在不插入失败和不增加假正概率的情况下增长的长得更多,即使过滤器创建的初始大小较小。因此生成的代码是开放源,可供一般使用。

0

相关内容

假阳性

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Understanding neural networks with reproducing kernel Banach spaces

Arxiv

0+阅读 · 2021年10月26日

Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks

Arxiv

0+阅读 · 2021年10月26日

A Survey on Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation

Arxiv

0+阅读 · 2021年10月22日

A Bayesian spatio-temporal analysis of markets during the Finnish 1860s famine

Arxiv

0+阅读 · 2021年10月22日

A Global Stochastic Optimization Particle Filter Algorithm

Arxiv

0+阅读 · 2021年10月21日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

Arxiv

3+阅读 · 2018年11月27日

Self-Attentive Neural Collaborative Filtering

Arxiv

5+阅读 · 2018年7月19日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

5+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Understanding neural networks with reproducing kernel Banach spaces

Arxiv

0+阅读 · 2021年10月26日

Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks

Arxiv

0+阅读 · 2021年10月26日

A Survey on Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation

Arxiv

0+阅读 · 2021年10月22日

A Bayesian spatio-temporal analysis of markets during the Finnish 1860s famine

Arxiv

0+阅读 · 2021年10月22日

A Global Stochastic Optimization Particle Filter Algorithm

Arxiv

0+阅读 · 2021年10月21日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

Arxiv

3+阅读 · 2018年11月27日

Self-Attentive Neural Collaborative Filtering

Arxiv

5+阅读 · 2018年7月19日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

微信扫码咨询专知VIP会员