Setsketch : 填补 MinHash 和 HyperLogLog 之间的空白 (SetSketch: Filling the Gap between MinHash and HyperLogLog) - 专知论文

会员服务 ·

0

估计/估计量 · CASES · FAST · 相似度 · Continuity ·

2021 年 8 月 11 日

SetSketch: Filling the Gap between MinHash and HyperLogLog

翻译：Setsketch : 填补 MinHash 和 HyperLogLog 之间的空白

from arxiv, VLDB 2021, extended version, 22 pages

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast comparison of sets as it allows estimating the Jaccard similarity and other joint quantities. This work presents a new data structure called SetSketch that is able to continuously fill the gap between both use cases. Its commutative and idempotent insert operation and its mergeable state make it suitable for distributed environments. Fast, robust, and easy-to-implement estimators for cardinality and joint quantities, as well as the ability to use SetSketch for similarity search, enable versatile applications. The presented joint estimator can also be applied to other data structures such as MinHash, HyperLogLog, or HyperMinHash, where it even performs better than the corresponding state-of-the-art estimators in many cases.

翻译：MinHash 和 HyperLogLog 是一种草图算法,对于大数据应用的设定摘要来说,这些算法是不可或缺的。虽然超LogLog允许用很小的空间来计算不同的元素, 但 MinHash 适合对各组进行快速比较, 因为它可以估算“ 贾卡相似性” 和其他联合数量。这项工作提出了一个名为 SetSketch 的新数据结构, 能够持续填补两个使用案例之间的空白。它的通和极能插入操作及其合并状态使得它适合分布式环境。快速、强大、易于执行的基点和联合数量的估测器, 以及使用 Setsketch 进行相似性搜索的能力, 启用多功能应用程序。所提出的联合估计器也可以应用到其他数据结构, 如 MinHash、超LogLog 或超MentMinHash 等, 在许多情况下, 其表现甚至优于相应的州艺术估计器。

0

相关内容

估计/估计量

估计/估计量

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【CIKM2021】联合优化查询编码器和乘积量化提高检索性能

专知会员服务

9+阅读 · 2021年9月16日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020-杭州电子科技大学】软化相似性学习的无监督行人重识别，Unsupervised Person Re-identification via Softened Similarity Learning

【CVPR2020-杭州电子科技大学】软化相似性学习的无监督行人重识别，Unsupervised Person Re-identification via Softened Similarity Learning

专知会员服务

23+阅读 · 2020年4月8日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

本周论文推荐 -- 对抗生成网络、知识图谱补全、对话系统、文本生成

本周论文推荐 -- 对抗生成网络、知识图谱补全、对话系统、文本生成

深度学习自然语言处理

8+阅读 · 2020年1月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

5+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization

Arxiv

0+阅读 · 2021年10月7日

RevASIDE: Assignment of Suitable Reviewer Sets for Publications from Fixed Candidate Pools (Extended Version)

RevASIDE: Assignment of Suitable Reviewer Sets for Publications from Fixed Candidate Pools (Extended Version)

Arxiv

0+阅读 · 2021年10月6日

Learn to Match: Automatic Matching Network Design for Visual Tracking

Arxiv

8+阅读 · 2021年8月2日

Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing

Arxiv

3+阅读 · 2021年5月17日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年6月25日

Hashing as Tie-Aware Learning to Rank

Arxiv

5+阅读 · 2018年3月28日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

Zero-Shot Sketch-Image Hashing

Arxiv

5+阅读 · 2018年3月6日

Adversarial Attribute-Image Person Re-identification

Arxiv

7+阅读 · 2018年2月6日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

【CIKM2021】联合优化查询编码器和乘积量化提高检索性能

专知会员服务

9+阅读 · 2021年9月16日

深度学习图像检索(CBIR): 十年之大综述

深度学习图像检索(CBIR): 十年之大综述

专知会员服务

47+阅读 · 2020年12月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020-杭州电子科技大学】软化相似性学习的无监督行人重识别，Unsupervised Person Re-identification via Softened Similarity Learning

【CVPR2020-杭州电子科技大学】软化相似性学习的无监督行人重识别，Unsupervised Person Re-identification via Softened Similarity Learning

专知会员服务

23+阅读 · 2020年4月8日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

医疗健康行业：AI应用白皮书

多模态大型语言模型：综述

ACL 2025 Findings | SIPO: 缓解多目标对齐中的偏好冲突

【ETZH博士论文】语言模型编程

相关资讯

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

本周论文推荐 -- 对抗生成网络、知识图谱补全、对话系统、文本生成

本周论文推荐 -- 对抗生成网络、知识图谱补全、对话系统、文本生成

深度学习自然语言处理

8+阅读 · 2020年1月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

5+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization

Arxiv

0+阅读 · 2021年10月7日

RevASIDE: Assignment of Suitable Reviewer Sets for Publications from Fixed Candidate Pools (Extended Version)

RevASIDE: Assignment of Suitable Reviewer Sets for Publications from Fixed Candidate Pools (Extended Version)

Arxiv

0+阅读 · 2021年10月6日

Learn to Match: Automatic Matching Network Design for Visual Tracking

Arxiv

8+阅读 · 2021年8月2日

Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing

Arxiv

3+阅读 · 2021年5月17日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年6月25日

Hashing as Tie-Aware Learning to Rank

Arxiv

5+阅读 · 2018年3月28日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

Zero-Shot Sketch-Image Hashing

Arxiv

5+阅读 · 2018年3月6日

Adversarial Attribute-Image Person Re-identification

Arxiv

7+阅读 · 2018年2月6日

微信扫码咨询专知VIP会员