简单设置拼件 (Simple Set Sketching) - 专知论文

会员服务 ·

0

SimPLe · 情景 · 分桶 · 哈希学习 · 粤港澳大湾区数字经济研究院 ·

2022 年 11 月 7 日

Simple Set Sketching

翻译：简单设置拼件

Jakob Bæk Tejs Houen,Rasmus Pagh,Stefan Walzer

from arxiv, To be published at SIAM Symposium on Simplicity in Algorithms (SOSA23)

Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For $\alpha n$ keys and $n$ buckets, where $\alpha$ is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently the situation reverses: If $\alpha$ is below a threshold of $\approx 0.81$ then we can recover the set of all inserted keys in linear time with high probability. Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.

翻译：想象在散列表格中处理碰撞时, 在每个单元格中存储比方的独家或一组关键散列, 以存储点 0. 81 美元。这似乎是一个可怕的想法 : 对于 $\ alpha n$ n$ key 和 $\ alpha$ 恒定的桶, 我们预计由于碰撞, 恒定的钥匙部分会无法被回收。我们显示, 如果这种碰撞解决策略连续三次重复, 情况会反转: 如果 $\ alpha$ 低于 $\ approx 0. 811 的阈值, 那么我们就可以在线性时间以很高的概率回收所有插入的钥匙集。尽管对数据结构的描述很简单, 但它的分析是非边际的。我们的方法可以被视为 Eppstein 和 Goodrich 的不可忽略的布局过滤器( IBFBF) 的变体。虽然 IBFIBS 包含一个明确的每桶的校验单方来决定桶储是否为单一的钥匙, 我们利用自省略概念的想法,,, 即该键的某些部分在存储地点是隐隐隐隐隐隐的钥匙, 。我们让这些钥匙在隐藏的错误成为了一种隐式的主要校验。

0

相关内容

SimPLe

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

上同调指标与具临界非线性项的拟线性椭圆方程

国家自然科学基金

0+阅读 · 2015年12月31日

两类迁移扩散方程组的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机光伏电池的界面电荷传导特性及相关调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

串联储能电源高效均衡系统结构及控制策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

p53对大肠癌中Numb/Notch信号通路调控的分子机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

复动力系统若干问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

Arxiv

0+阅读 · 2022年12月30日

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

Arxiv

0+阅读 · 2022年12月30日

Additive Polynomial Time Integrators, Part I: Framework and Fully-Implicit-Explicit (FIMEX) Collocation Methods

Arxiv

0+阅读 · 2022年12月30日

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Arxiv

0+阅读 · 2022年12月30日

Exploration of latent space of LOD2 GML dataset to identify similar buildings

Arxiv

0+阅读 · 2022年12月28日

Exact Matching: Algorithms and Related Problems

Arxiv

0+阅读 · 2022年12月27日

Domain Generalization using Causal Matching

Arxiv

12+阅读 · 2021年6月29日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

粤港澳大湾区数字经济研究院

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

超越机械控制：神经形态军事人工智能中的因果决策处理

《构建战略杀伤力：美军联合部队学习与领导者发展的特种作战模型》

《元宇宙在军事领域的应用》

《乌克兰战场联合兵种机动的新兴方法》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

相关论文

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

Arxiv

0+阅读 · 2022年12月30日

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

Arxiv

0+阅读 · 2022年12月30日

Additive Polynomial Time Integrators, Part I: Framework and Fully-Implicit-Explicit (FIMEX) Collocation Methods

Arxiv

0+阅读 · 2022年12月30日

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Arxiv

0+阅读 · 2022年12月30日

Exploration of latent space of LOD2 GML dataset to identify similar buildings

Arxiv

0+阅读 · 2022年12月28日

Exact Matching: Algorithms and Related Problems

Arxiv

0+阅读 · 2022年12月27日

Domain Generalization using Causal Matching

Arxiv

12+阅读 · 2021年6月29日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Arxiv

18+阅读 · 2019年12月25日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

相关基金

上同调指标与具临界非线性项的拟线性椭圆方程

国家自然科学基金

0+阅读 · 2015年12月31日

两类迁移扩散方程组的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机光伏电池的界面电荷传导特性及相关调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

串联储能电源高效均衡系统结构及控制策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

p53对大肠癌中Numb/Notch信号通路调控的分子机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

复动力系统若干问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员