学习插值法提高流数据分位数近似估计的最坏情况保证 (Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees) - 专知论文

会员服务 ·

0

近似估计 · 最坏情况 · 近似 · 流数据 · 近似算法 ·

2023 年 4 月 15 日

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

翻译：学习插值法提高流数据分位数近似估计的最坏情况保证

Nicholas Schiefer,Justin Y. Chen,Piotr Indyk,Shyam Narayanan,Sandeep Silwal,Tal Wagner

from arxiv, 11 pages, 5 figures, published at SIAM ACDA 2023

An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning's t-digest, which often achieves much better approximations than KLL on real-world data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case.

翻译：一组包括n个输入的流数据中，一个$\varepsilon$-近似的分位数草图近似估计了任何查询点$q$的排名 - 也就是小于$q$的输入点数 -，误差为$\varepsilon n$，通常以至少$1-1/\mathrm{poly}(n)$的概率对最坏情况流采用占用$o(n)$空间的最优化分位数近似算法。虽然Karnin、Lang和Liberty的著名KLL草图在最坏情况流上实现了可证明的最优分位数近似算法，但在实践中所实现的近似估计通常远非最优。实际上，实践中最常用的技术是Dunning的t-digest，它在真实数据上往往比KLL实现了更好的近似估计，但其在最坏情况下已知存在无限大的误差。我们将插值技术应用于流分位数问题，试图在保持类似的最坏情况保证的同时，在真实数据集上实现比KLL更好的近似估计。

0

相关内容

近似估计

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

基于Wyner-Ziv分布式编码的无线视频通信端到端失真度估算

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

非对称锥优化理论与内点算法及其应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

特征选择中的全局最优搜索策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶圆制造Interbay物料运输系统的动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于SAS数据的水下复杂场景中目标识别研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

无线信道统计复用基础理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

亚椭圆算子的泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2011年12月31日

降低多视点编码复杂度的优化模型与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

A low-rank isogeometric solver based on Tucker tensors

Arxiv

0+阅读 · 2023年6月1日

Preserving the positivity of the deformation gradient determinant in intergrid interpolation by combining RBFs and SVD: application to cardiac electromechanics

Arxiv

0+阅读 · 2023年6月1日

Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization

Arxiv

0+阅读 · 2023年5月31日

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Arxiv

0+阅读 · 2023年5月30日

On the Approximability of External-Influence-Driven Problems

Arxiv

0+阅读 · 2023年5月30日

Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers

Arxiv

0+阅读 · 2023年5月30日

On a neural network approach for solving potential control problem of the semiclassical Schrödinger equation

Arxiv

0+阅读 · 2023年5月30日

On the Variance, Admissibility, and Stability of Empirical Risk Minimization

Arxiv

0+阅读 · 2023年5月29日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】行动，规划与学习，622页pdf

美军坦克部队反无人机新策略：主炮轰击方案

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

数据质量维度的实践展开：一项综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

A low-rank isogeometric solver based on Tucker tensors

Arxiv

0+阅读 · 2023年6月1日

Preserving the positivity of the deformation gradient determinant in intergrid interpolation by combining RBFs and SVD: application to cardiac electromechanics

Arxiv

0+阅读 · 2023年6月1日

Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization

Arxiv

0+阅读 · 2023年5月31日

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Arxiv

0+阅读 · 2023年5月30日

On the Approximability of External-Influence-Driven Problems

Arxiv

0+阅读 · 2023年5月30日

Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers

Arxiv

0+阅读 · 2023年5月30日

On a neural network approach for solving potential control problem of the semiclassical Schrödinger equation

Arxiv

0+阅读 · 2023年5月30日

On the Variance, Admissibility, and Stability of Empirical Risk Minimization

Arxiv

0+阅读 · 2023年5月29日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

基于Wyner-Ziv分布式编码的无线视频通信端到端失真度估算

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

非对称锥优化理论与内点算法及其应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

特征选择中的全局最优搜索策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶圆制造Interbay物料运输系统的动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于SAS数据的水下复杂场景中目标识别研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

无线信道统计复用基础理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

亚椭圆算子的泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2011年12月31日

降低多视点编码复杂度的优化模型与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员