大量数据多重估计:向收入动态研究小组提出的申请 (Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics) - 专知论文

会员服务 ·

0

Extensibility · 复合数据 · 多重共线性 · INFORMS · 相关系数 ·

2021 年 4 月 21 日

Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics

翻译：大量数据多重估计:向收入动态研究小组提出的申请

Yajuan Si,Steve Heeringa,David Johnson,Roderick Little,Wenshuo Liu,Fabian Pfeffer,Trivellore Raghunathan

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods. We use a sequential regression/ chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with results from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures with efficiency gains. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.

翻译：多重估算(MI)是处理多变量数据集中缺失的数据的流行和既定方法,但在大规模和复杂数据集中使用的这一方法的实用性受到质疑,其中一组数据是收入动态小组研究(PSID),这是对美国家庭收入和财富的长期和广泛调查。本次调查的缺失数据目前使用传统的热甲板方法处理。我们使用软件IVEware, 将2013年PSID的跨部门财富数据填充成数,并将由此得出的估算数据的分析与当前热甲板方法的结果进行比较。介绍了实际困难,如非正常分布变量、跳动模式、多个层次的绝对变量和多线性,以及我们克服这些困难的方法。我们用内部诊断和外部基准数据来评估估算质量和有效性。MI通过帮助保持相关性和增效,改进了现有热甲方法。我们建议实际实施MI,并期望在缺失信息的比例较大时取得更大收益。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

专知会员服务

105+阅读 · 2020年3月22日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

专知会员服务

10+阅读 · 2019年11月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

已删除

德先生

53+阅读 · 2019年4月28日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Divide-and-Conquer MCMC for Multivariate Binary Data

Divide-and-Conquer MCMC for Multivariate Binary Data

Arxiv

0+阅读 · 2021年6月11日

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Arxiv

0+阅读 · 2021年6月11日

Price graphs: Utilizing the structural information of financial time series for stock prediction

Arxiv

0+阅读 · 2021年6月11日

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Arxiv

0+阅读 · 2021年6月10日

Academics evaluating academics: a methodology to inform the review process on top of open citations

Academics evaluating academics: a methodology to inform the review process on top of open citations

Arxiv

0+阅读 · 2021年6月10日

cgmquantify: Python and R packages for comprehensive analysis of interstitial glucose and glycemic variability from continuous glucose monitor data

Arxiv

0+阅读 · 2021年2月8日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Arxiv

14+阅读 · 2019年1月17日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines

Arxiv

5+阅读 · 2018年6月11日

VIP会员

文章信息

相关主题

多重共线性

相关VIP内容

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

【电子书】大数据挖掘，Mining of Massive Datasets，附513页PDF

专知会员服务

105+阅读 · 2020年3月22日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

专知会员服务

10+阅读 · 2019年11月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能赋能自主武器与人类控制第二部分：人类控制与军事指挥官 | 38页

美海军陆战队·兵力设计更新文档（2025年）

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

人工智能赋能自主武器与人类控制第一部分：人类控制与机器学习的设计和开发 | 46页

相关资讯

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

已删除

德先生

53+阅读 · 2019年4月28日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Divide-and-Conquer MCMC for Multivariate Binary Data

Divide-and-Conquer MCMC for Multivariate Binary Data

Arxiv

0+阅读 · 2021年6月11日

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Arxiv

0+阅读 · 2021年6月11日

Price graphs: Utilizing the structural information of financial time series for stock prediction

Arxiv

0+阅读 · 2021年6月11日

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Multiple Dynamic Pricing for Demand Response with Adaptive Clustering-based Customer Segmentation in Smart Grids

Arxiv

0+阅读 · 2021年6月10日

Academics evaluating academics: a methodology to inform the review process on top of open citations

Academics evaluating academics: a methodology to inform the review process on top of open citations

Arxiv

0+阅读 · 2021年6月10日

cgmquantify: Python and R packages for comprehensive analysis of interstitial glucose and glycemic variability from continuous glucose monitor data

Arxiv

0+阅读 · 2021年2月8日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Arxiv

14+阅读 · 2019年1月17日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines

Arxiv

5+阅读 · 2018年6月11日

微信扫码咨询专知VIP会员