用 Taffy 过滤器拉伸数据 (Stretching Your Data With Taffy Filters) - 专知论文

会员服务 ·

0

假阳性 · 哈希学习 · 总回报 · 近似 · Networking ·

2021 年 9 月 8 日

Stretching Your Data With Taffy Filters

翻译：用 Taffy 过滤器拉伸数据

from arxiv, 11 pages, 13 figures

Popular approximate membership query structures such as Bloom filters and cuckoo filters are widely used in databases, security, and networking. These structures support two operations -- insert and lookup; lookup always returns true on elements inserted into the structure, while it returns true with some probability $\varepsilon \ll 1$ on elements not inserted into the structure. These latter elements are called false positives. Compensatory for these false positives, filters can be much smaller than hash tables that represent the same set. However, unlike hash tables, cuckoo filters and Bloom filters must be initialized with the intended number of inserts to be performed, and cannot grow larger -- inserts beyond this number may fail or increase the false positive probability. This paper presents designs and implementations of filters than can grow without inserts failing and without significantly increasing the false positive probability, even if the filters are created with a small initial size. The resulting code is available on GitHub under a permissive open source license.

翻译：Bloom 过滤器和 cuckoo 过滤器等广受欢迎的成员查询结构在数据库、安全和网络中广泛使用。这些结构支持两种操作 -- -- 插入和查看;查找总是对插入结构的元素真实返回,而对于未插入结构的元素则返回真实,但可能性为$\varepsilon\ll 1美元。这些元素被称为假正数。对这些假正数的补偿,过滤器可能比代表同一组的散列表要小得多。然而,与散列表格不同,必须初始化 cuckoo 过滤器和闪烁过滤器,并启动要完成的插入数量,且不能扩大 -- -- 超过此数的插入可能失败或增加错误正概率。本文显示过滤器的设计和实施,但不能在不插入失败的情况下增长,而且不会大大增加假正概率,即使过滤器的初始大小较小。因此在 GitHub 上可以使用允许的开源许可。

0

相关内容

假阳性

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Particle filter efficiency under limited communication

Arxiv

0+阅读 · 2021年10月28日

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Arxiv

0+阅读 · 2021年10月28日

Analysis of COVID-19 in Japan with Extended SEIR model and ensemble Kalman filter

Analysis of COVID-19 in Japan with Extended SEIR model and ensemble Kalman filter

Arxiv

1+阅读 · 2021年10月28日

Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risks Data: With Applications to Massive Biobank Data

Arxiv

0+阅读 · 2021年10月28日

Mining frequency-based sequential trajectory co-clusters

Arxiv

0+阅读 · 2021年10月27日

On Success and Simplicity: A Second Look at Transferable Targeted Attacks

Arxiv

0+阅读 · 2021年10月26日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Particle filter efficiency under limited communication

Arxiv

0+阅读 · 2021年10月28日

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Arxiv

0+阅读 · 2021年10月28日

Analysis of COVID-19 in Japan with Extended SEIR model and ensemble Kalman filter

Analysis of COVID-19 in Japan with Extended SEIR model and ensemble Kalman filter

Arxiv

1+阅读 · 2021年10月28日

Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risks Data: With Applications to Massive Biobank Data

Arxiv

0+阅读 · 2021年10月28日

Mining frequency-based sequential trajectory co-clusters

Arxiv

0+阅读 · 2021年10月27日

On Success and Simplicity: A Second Look at Transferable Targeted Attacks

Arxiv

0+阅读 · 2021年10月26日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

微信扫码咨询专知VIP会员