用于对三万倍规模共变矩阵进行在线粗略估计的主动抽样算数缩略图(ASCS) (Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix) - 专知论文

会员服务 ·

0

估计/估计量 · 协方差矩阵 · 稀疏 · 缩放 · 计算机科学 ·

2021 年 6 月 11 日

Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix

翻译：用于对三万倍规模共变矩阵进行在线粗略估计的主动抽样算数缩略图(ASCS)

Zhenwei Dai,Aditya Desai,Reinhard Heckel,Anshumali Shrivastava

from arxiv, 13 pages

Estimating and storing the covariance (or correlation) matrix of high-dimensional data is computationally challenging because both memory and computational requirements scale quadratically with the dimension. Fortunately, high-dimensional covariance matrices as observed in text, click-through, meta-genomics datasets, etc are often sparse. In this paper, we consider the problem of efficient sparse estimation of covariance matrices with possibly trillions of entries. The size of the datasets we target requires the algorithm to be online, as more than one pass over the data is prohibitive. In this paper, we propose Active Sampling Count Sketch (ASCS), an online and one-pass sketching algorithm, that recovers the large entries of the covariance matrix accurately. Count Sketch (CS), and other sub-linear compressed sensing algorithms, offer a natural solution to the problem in theory. However, vanilla CS does not work well in practice due to a low signal-to-noise ratio (SNR). At the heart of our approach is a novel active sampling strategy that increases the SNR of classical CS. We demonstrate the practicality of our algorithm with synthetic data and real-world high dimensional datasets. ASCS significantly improves over vanilla CS, demonstrating the merit of our active sampling strategy.

翻译：估算和储存高维数据的共变(或相关关系)矩阵在计算上具有挑战性,因为记忆和计算要求的尺度都与维度相仿。幸运的是,在文本中观察到的高维共变矩阵、点击通、元基因组数据集等往往很少。在本文件中,我们考虑了以数万亿个条目对共变矩阵进行高效少估的问题。我们的目标数据集的大小要求算法是在线的,因为超过数据的一个通过量令人望而却望而却步。在本文中,我们建议采用一种在线和一流的绘图算法,即主动采集共变数矩阵的大条目。计数Schach(CS)和其他子线性压缩测算算算法,为理论问题提供了自然的解决方案。然而,由于信号到噪音比率低(SNRR),vanilla CS在实际操作上并不奏效。我们的方法的核心是新的积极取样战略,这增加了CNS的S级CR。我们展示了我们对SS的高度数据进行高水平的合成算法。

0

相关内容

估计/估计量

估计/估计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

计算文本相似度常用的四种方法

计算文本相似度常用的四种方法

论智

33+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Identification in Bayesian Estimation of the Skewness Matrix in a Multivariate Skew-Elliptical Distribution

Identification in Bayesian Estimation of the Skewness Matrix in a Multivariate Skew-Elliptical Distribution

Arxiv

0+阅读 · 2021年8月9日

Nonlinear Level Set Learning for Function Approximation on Sparse Data with Applications to Parametric Differential Equations

Arxiv

0+阅读 · 2021年8月7日

Central limit theorems for high dimensional dependent data

Arxiv

0+阅读 · 2021年8月7日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

Geometry in Active Learning for Binary and Multi-class Image Segmentation

Arxiv

9+阅读 · 2018年1月16日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

VIP会员

文章信息

相关主题

估计/估计量

协方差矩阵

计算机科学

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

计算文本相似度常用的四种方法

计算文本相似度常用的四种方法

论智

33+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

Identification in Bayesian Estimation of the Skewness Matrix in a Multivariate Skew-Elliptical Distribution

Identification in Bayesian Estimation of the Skewness Matrix in a Multivariate Skew-Elliptical Distribution

Arxiv

0+阅读 · 2021年8月9日

Nonlinear Level Set Learning for Function Approximation on Sparse Data with Applications to Parametric Differential Equations

Arxiv

0+阅读 · 2021年8月7日

Central limit theorems for high dimensional dependent data

Arxiv

0+阅读 · 2021年8月7日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

Geometry in Active Learning for Binary and Multi-class Image Segmentation

Arxiv

9+阅读 · 2018年1月16日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

微信扫码咨询专知VIP会员