与错误数据的因果关系:计量错误、缺失值、差异化和差异隐私 (Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy) - 专知论文

会员服务 ·

0

离散化 · 推断 · 近似 · 统计量 · 置信度 ·

2021 年 12 月 10 日

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

翻译：与错误数据的因果关系:计量错误、缺失值、差异化和差异隐私

Anish Agarwal,Rahul Singh

from arxiv, 136 pages

Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for global parameters such as average treatment effect, and it degrades gracefully for local parameters such as heterogeneous treatment effect for a specific demographic. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations calibrated to resemble differential privacy as implemented in the 2020 US Census.

翻译：经验性研究的标准工作流程涉及数据清理,然后进行数据分析,通常忽视数据清理的偏差和差异后果。我们制定了对腐败数据进行因果关系推断的半参数模型,既包括数据清理,也包括数据分析。我们提出了数据清理、估算和数据清理调整信任间隔的端对端新程序。我们用限定抽样参数对因果参数的估测员证明了一致性、高斯近距离和半参数效率。对于平均治疗效果等全球参数而言,高斯近差率为$ ⁇ -1/2美元,对本地参数(如特定人口的不同治疗效果)的优度也有所降低。我们的主要假设是,真实的共变数的等级大约较低。我们的分析为矩阵完成、统计学习和半对称统计统计统计提供了不附带的理论贡献。我们核查了根据2020年美国人口普查所实施的与差异隐私相校准的模拟中数据清理调整信任间隔的覆盖面。

0

相关内容

离散化

因果推断，Causal Inference：The Mixtape

因果推断，Causal Inference：The Mixtape

专知会员服务

108+阅读 · 2021年8月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知会员服务

56+阅读 · 2020年3月26日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知

12+阅读 · 2020年3月26日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】使用无参数统计和聚类实现SLAM中识别物体的定位

【泡泡一分钟】使用无参数统计和聚类实现SLAM中识别物体的定位

泡泡机器人SLAM

4+阅读 · 2019年2月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Contextual Importance and Utility: aTheoretical Foundation

Arxiv

0+阅读 · 2022年2月15日

Robust Estimation of Discrete Distributions under Local Differential Privacy

Arxiv

0+阅读 · 2022年2月14日

Misspecification Analysis of High-Dimensional Random Effects Models for Estimation of Signal-to-Noise Ratios

Arxiv

0+阅读 · 2022年2月13日

Thermodynamically consistent and positivity-preserving discretization of the thin-film equation with thermal noise

Arxiv

0+阅读 · 2022年2月12日

High-dimensional properties for empirical priors in linear regression with unknown error variance

Arxiv

0+阅读 · 2022年2月11日

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization

Arxiv

0+阅读 · 2022年2月10日

MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data

Arxiv

12+阅读 · 2021年10月27日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

因果推断，Causal Inference：The Mixtape

因果推断，Causal Inference：The Mixtape

专知会员服务

108+阅读 · 2021年8月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知会员服务

56+阅读 · 2020年3月26日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025教程】基础模型遇见具身智能体

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

扩散模型中的缓存方法综述：迈向高效的多模态生成

【ICCV2025教程】《迈向视觉语言模型的全面推理》

相关资讯

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

【CMU-Spring2020课程】离散微分几何15讲，Discrete Differential Geometry

专知

12+阅读 · 2020年3月26日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】使用无参数统计和聚类实现SLAM中识别物体的定位

【泡泡一分钟】使用无参数统计和聚类实现SLAM中识别物体的定位

泡泡机器人SLAM

4+阅读 · 2019年2月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Contextual Importance and Utility: aTheoretical Foundation

Arxiv

0+阅读 · 2022年2月15日

Robust Estimation of Discrete Distributions under Local Differential Privacy

Arxiv

0+阅读 · 2022年2月14日

Misspecification Analysis of High-Dimensional Random Effects Models for Estimation of Signal-to-Noise Ratios

Arxiv

0+阅读 · 2022年2月13日

Thermodynamically consistent and positivity-preserving discretization of the thin-film equation with thermal noise

Arxiv

0+阅读 · 2022年2月12日

High-dimensional properties for empirical priors in linear regression with unknown error variance

Arxiv

0+阅读 · 2022年2月11日

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization

Arxiv

0+阅读 · 2022年2月10日

MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data

Arxiv

12+阅读 · 2021年10月27日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员