连续合成变量的识别风险评估 (Identification Risk Evaluation of Continuous Synthesized Variables)

We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius $r$ in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius $r$, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.

翻译：我们提出了评估部分合成数据中连续合成变数的识别风险的一般方法。我们采用半径美元来构建每个目标记录的识别风险概率,并用工作实例来说明一个或多个连续合成变数。我们展示了对消费者支出调查数据样本的应用方法,并讨论了对风险和数据效用的影响:(1) 半径的选择,(2) 合成变数的选择,(3) 合成数据集数量的选择。我们建议统计机构对连续变数的识别风险进行综合和评估。我们制作了一个R包,以实施我们提议的识别风险评估方法,并包括R样本脚本。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日