根据内存限制进行大规模数据分析的序列处理子抽样方法 (A Sequential Addressing Subsampling Method for Massive Data Analysis under Memory Constraint) - 专知论文

会员服务 ·

0

子采样 · SAS · 样本 · 样本均值 · Performer ·

2021 年 10 月 3 日

A Sequential Addressing Subsampling Method for Massive Data Analysis under Memory Constraint

翻译：根据内存限制进行大规模数据分析的序列处理子抽样方法

Rui Pan,Yingqiu Zhu,Baishan Guo,Xuening Zhu,Hansheng Wang

The emergence of massive data in recent years brings challenges to automatic statistical inference. This is particularly true if the data are too numerous to be read into memory as a whole. Accordingly, new sampling techniques are needed to sample data from a hard drive. In this paper, we propose a sequential addressing subsampling (SAS) method, that can sample data directly from the hard drive. The newly proposed SAS method is time saving in terms of addressing cost compared to that of the random addressing subsampling (RAS) method. Estimators (e.g., the sample mean) based on the SAS subsamples are constructed, and their properties are studied. We conduct a series of simulation studies to verify the finite sample performance of the proposed SAS estimators. The time cost is also compared between the SAS and RAS methods. An analysis of the airline data is presented for illustration purpose.

翻译：近年来大量数据的出现给自动统计推断带来了挑战。如果数据数量过多,无法从整体记忆中读取,情况尤其如此。因此,需要采用新的取样技术来从硬盘中取样数据。在本文件中,我们提议了一种按顺序处理子抽样的方法,可以直接从硬盘中取样数据。新提议的SAS方法在处理费用方面节省时间,与随机处理子抽样(RAS)方法相比较。根据SAS子样本(例如,样本平均数)进行估算,并研究其特性。我们进行一系列模拟研究,以核实拟议的SAS估计数字的有限抽样性能。时间成本也在SAS和RAS方法之间进行比较。对航空数据的分析是为了说明目的。

0

相关内容

子采样

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【KDD2019|讲座推荐】从海量文本中构建和挖掘异构信息网络：Constructing and Mining Heterogeneous Information Networks from Massive Text

【KDD2019|讲座推荐】从海量文本中构建和挖掘异构信息网络：Constructing and Mining Heterogeneous Information Networks from Massive Text

专知会员服务

47+阅读 · 2019年12月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Massive MIMO Adaptive Modulation and Coding Using Online Deep Learning Algorithm

Arxiv

0+阅读 · 2021年11月26日

Non-IID data and Continual Learning processes in Federated Learning: A long road ahead

Arxiv

1+阅读 · 2021年11月26日

A deep learning based reduced order modeling for stochastic underground flow problems

Arxiv

0+阅读 · 2021年11月26日

The Micro-Randomized Trial for Developing Digital Interventions: Experimental Design and Data Analysis Considerations

Arxiv

0+阅读 · 2021年11月25日

Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem

Arxiv

0+阅读 · 2021年11月24日

Causal Intervention for Leveraging Popularity Bias in Recommendation

Arxiv

3+阅读 · 2021年5月13日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

VIP会员

文章信息

相关主题

相关VIP内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【KDD2019|讲座推荐】从海量文本中构建和挖掘异构信息网络：Constructing and Mining Heterogeneous Information Networks from Massive Text

【KDD2019|讲座推荐】从海量文本中构建和挖掘异构信息网络：Constructing and Mining Heterogeneous Information Networks from Massive Text

专知会员服务

47+阅读 · 2019年12月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

Massive MIMO Adaptive Modulation and Coding Using Online Deep Learning Algorithm

Arxiv

0+阅读 · 2021年11月26日

Non-IID data and Continual Learning processes in Federated Learning: A long road ahead

Arxiv

1+阅读 · 2021年11月26日

A deep learning based reduced order modeling for stochastic underground flow problems

Arxiv

0+阅读 · 2021年11月26日

The Micro-Randomized Trial for Developing Digital Interventions: Experimental Design and Data Analysis Considerations

Arxiv

0+阅读 · 2021年11月25日

Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem

Arxiv

0+阅读 · 2021年11月24日

Causal Intervention for Leveraging Popularity Bias in Recommendation

Arxiv

3+阅读 · 2021年5月13日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

微信扫码咨询专知VIP会员