关于行政税务数据有区别的私人简要统计和回归分析的可行性研究 (A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data) - 专知论文

会员服务 ·

0

统计量 · Extensibility · 可行 · 估计/估计量 · SimPLe ·

2021 年 10 月 22 日

A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data

翻译：关于行政税务数据有区别的私人简要统计和回归分析的可行性研究

Andrés F. Barrientos,Aaron R. Williams,Joshua Snoke,Claire McKay Bowen

Federal administrative tax data are invaluable for research, but because of privacy concerns, access to these data is typically limited to select agencies and a few individuals. An alternative to sharing microlevel data are validation servers, which allow individuals to query statistics without accessing the confidential data. This paper studies the feasibility of using differentially private (DP) methods to implement such a server. We provide an extensive study on existing DP methods for releasing tabular statistics, means, quantiles, and regression estimates. We also include new methodological adaptations to existing DP regression algorithms for using new data types and returning standard error estimates. We evaluate the selected methods based on the accuracy of the output for statistical analyses, using real administrative tax data obtained from the Internal Revenue Service Statistics of Income (SOI) Division. Our findings show that a validation server would be feasible for simple statistics but would struggle to produce accurate regression estimates and confidence intervals. We outline challenges and offer recommendations for future work on validation servers. This is the first comprehensive statistical study of DP methodology on a real, complex dataset, that has significant implications for the direction of a growing research field.

翻译：联邦行政税收数据对研究来说是宝贵的,但是由于隐私方面的考虑,这些数据的获取通常仅限于某些机构和少数个人; 分享微观一级数据的替代办法是验证服务器,这种服务器使个人可以查询统计数据而无需查阅机密数据; 本文研究使用差别私人(DP)方法实施这种服务器的可行性; 我们对现有DP方法进行广泛研究,以公布表格统计数据、手段、量化和回归估计; 我们还包括对现有DP回归算法进行新的方法调整,以便使用新的数据类型和返回标准错误估计; 我们根据统计分析产出的准确性评价选定的方法,使用国内税收局收入统计司(SOI)获得的实际行政税收数据; 我们的调查结果显示,验证服务器对于简单的统计来说是可行的,但将难以得出准确的回归估计和信任间隔; 我们概述了关于验证服务器的未来工作的挑战和建议; 这是关于真实、复杂的数据集的DP方法的首次全面统计研究,对不断增长的研究方向有重大影响。

0

相关内容

统计量

【WWW2021】对众包系统的数据中毒攻击和防御

【WWW2021】对众包系统的数据中毒攻击和防御

专知会员服务

21+阅读 · 2021年2月22日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

实用信息安全管理，253页pdf，Practical Information Security Management

专知会员服务

25+阅读 · 2020年5月31日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Interval Privacy: A Framework for Privacy-Preserving Data Collection

Arxiv

0+阅读 · 2021年12月23日

Unsupervised Data Selection for Data-Centric Semi-Supervised Learning

Arxiv

0+阅读 · 2021年12月23日

Globally convergent visual-feature range estimation with biased inertial measurements

Arxiv

0+阅读 · 2021年12月23日

On the Differential Private Data Market: Endogenous Evolution, Dynamic Pricing, and Incentive Compatibility

Arxiv

0+阅读 · 2021年12月22日

Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

Arxiv

0+阅读 · 2021年12月22日

QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog

Arxiv

0+阅读 · 2021年12月22日

Practical Active Learning with Model Selection for Small Data

Arxiv

0+阅读 · 2021年12月21日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【WWW2021】对众包系统的数据中毒攻击和防御

【WWW2021】对众包系统的数据中毒攻击和防御

专知会员服务

21+阅读 · 2021年2月22日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

实用信息安全管理，253页pdf，Practical Information Security Management

专知会员服务

25+阅读 · 2020年5月31日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战略分析：面向国防与国际安全的建模与仿真》

《俄乌战争中影响力行动的社交媒体分析》2025最新69页

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

《用于评估军事作战场景的仿真环境》

相关资讯

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Interval Privacy: A Framework for Privacy-Preserving Data Collection

Arxiv

0+阅读 · 2021年12月23日

Unsupervised Data Selection for Data-Centric Semi-Supervised Learning

Arxiv

0+阅读 · 2021年12月23日

Globally convergent visual-feature range estimation with biased inertial measurements

Arxiv

0+阅读 · 2021年12月23日

On the Differential Private Data Market: Endogenous Evolution, Dynamic Pricing, and Incentive Compatibility

Arxiv

0+阅读 · 2021年12月22日

Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

Arxiv

0+阅读 · 2021年12月22日

QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog

Arxiv

0+阅读 · 2021年12月22日

Practical Active Learning with Model Selection for Small Data

Arxiv

0+阅读 · 2021年12月21日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员