新的红心动动画仪分析 -- -- 扩展HyperLogLogLoog (Analysis of a new Cardinality Estimator -- ExtendedHyperLogLog) - 专知论文

会员服务 ·

0

模型评估 · 估计/估计量 · 流 · Less · Better ·

2021 年 6 月 11 日

Analysis of a new Cardinality Estimator -- ExtendedHyperLogLog

翻译：新的红心动动画仪分析 -- -- 扩展HyperLogLogLoog

Tal Ohayon,Aryeh Kontorovich

We discuss the problem of counting distinct elements in a stream. A stream is usually considered as a sequence of elements that come one at a time. An exact solution to the problem requires memory space of the size of the stream. For many applications this solution is infeasible due to very large streams. The solution in that case, is to use a probabilistic data structure (also called sketch), from which we can estimate with high accuracy the cardinality of the stream. We present a new algorithm, ExtendedHyperLogLog (EHLL), which is based on the state-of-the-art algorithm, HyperLogLog (HLL). In order to achieve the same accuracy as HLL, EHLL uses 16% less memory. In recent years, a martingale approach has bean developed. In the martingale setting we receive better accuracy at the price of not being able to merge sketches. EHLL also works in the martingale setting. Martingale EHLL achieves the same accuracy as Martingale HLL using 12% less memory.

翻译：我们讨论在流中计算不同元素的问题。流通常被视为一个元素序列, 一次产生一个元素。问题的精确解决方案需要流体大小的记忆空间。对于许多应用来说, 这个解决方案是无法做到的, 因为流体非常大。在这种情况下, 解决办法是使用概率性数据结构( 也称为草图), 我们可以从中非常精确地估计流体的基点。我们提出了一个新的算法, 扩展 HyperLogLog (ELL), 其基础是最新算法, 超LogLog (HLL) 。为了实现与 HLL 相同的精确度, ELL 使用比 16 % 的记忆。最近几年, martingale 方法已经形成。在 martingale 设置中, 我们以无法合并草图的价格得到更好的精度。 ELL 还在 martingale 设置中工作。 Martingale EHLLL 实现与 Martingale 低12% 的记忆一样的精度。

0

相关内容

模型评估

机器学习系统设计系统评估标准

【斯坦福】分布式算法与优化，118页pdf

专知会员服务

82+阅读 · 2020年12月22日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【干货书】机器学习Python实战教程，366页pdf

【干货书】机器学习Python实战教程，366页pdf

专知会员服务

344+阅读 · 2020年3月17日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

343+阅读 · 2020年3月15日

【课程推荐】CMPUT 651: Topics in Artificial Intelligence--Deep Learning for NLP

【课程推荐】CMPUT 651: Topics in Artificial Intelligence--Deep Learning for NLP

专知会员服务

20+阅读 · 2019年11月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

LeetCode的C++ 11/Python3 题解及解释

LeetCode的C++ 11/Python3 题解及解释

专知

16+阅读 · 2019年4月13日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[情人节] jieba分词介绍

[情人节] jieba分词介绍

机器学习和数学

3+阅读 · 2018年2月14日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

专知

3+阅读 · 2017年9月29日

A General Cardinality Estimation Framework for Subgraph Matching in Property Graphs

Arxiv

0+阅读 · 2021年8月11日

SetSketch: Filling the Gap between MinHash and HyperLogLog

Arxiv

0+阅读 · 2021年8月11日

Γ-convergence of Onsager-Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2021年8月11日

Are We Ready For Learned Cardinality Estimation?

Arxiv

0+阅读 · 2021年8月10日

On the boundary properties of Bernstein estimators on the simplex

Arxiv

0+阅读 · 2021年8月8日

Nonexistence of a Universal Algorithm for Traveling Salesman Problems in Constructive Mathematics

Arxiv

0+阅读 · 2021年8月7日

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Arxiv

0+阅读 · 2021年8月6日

Fast Algorithms and Error Analysis of Caputo Derivatives with Small Factional Orders

Arxiv

0+阅读 · 2021年8月6日

Finite Element Approximation of Steady Flows of Colloidal Solutions

Arxiv

0+阅读 · 2021年8月5日

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

Arxiv

7+阅读 · 2018年4月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【斯坦福】分布式算法与优化，118页pdf

专知会员服务

82+阅读 · 2020年12月22日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【干货书】机器学习Python实战教程，366页pdf

【干货书】机器学习Python实战教程，366页pdf

专知会员服务

344+阅读 · 2020年3月17日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

343+阅读 · 2020年3月15日

【课程推荐】CMPUT 651: Topics in Artificial Intelligence--Deep Learning for NLP

【课程推荐】CMPUT 651: Topics in Artificial Intelligence--Deep Learning for NLP

专知会员服务

20+阅读 · 2019年11月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

LeetCode的C++ 11/Python3 题解及解释

LeetCode的C++ 11/Python3 题解及解释

专知

16+阅读 · 2019年4月13日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF B类期刊IPM专刊截稿信息1条

CCF B类期刊IPM专刊截稿信息1条

Call4Papers

3+阅读 · 2018年10月11日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[情人节] jieba分词介绍

[情人节] jieba分词介绍

机器学习和数学

3+阅读 · 2018年2月14日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

专知

3+阅读 · 2017年9月29日

相关论文

A General Cardinality Estimation Framework for Subgraph Matching in Property Graphs

Arxiv

0+阅读 · 2021年8月11日

SetSketch: Filling the Gap between MinHash and HyperLogLog

Arxiv

0+阅读 · 2021年8月11日

Γ-convergence of Onsager-Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2021年8月11日

Are We Ready For Learned Cardinality Estimation?

Arxiv

0+阅读 · 2021年8月10日

On the boundary properties of Bernstein estimators on the simplex

Arxiv

0+阅读 · 2021年8月8日

Nonexistence of a Universal Algorithm for Traveling Salesman Problems in Constructive Mathematics

Arxiv

0+阅读 · 2021年8月7日

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Arxiv

0+阅读 · 2021年8月6日

Fast Algorithms and Error Analysis of Caputo Derivatives with Small Factional Orders

Arxiv

0+阅读 · 2021年8月6日

Finite Element Approximation of Steady Flows of Colloidal Solutions

Arxiv

0+阅读 · 2021年8月5日

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

Arxiv

7+阅读 · 2018年4月24日

微信扫码咨询专知VIP会员