R 套件$\ textt{身份识别风险评估R美元部分合成数据的识别风险评估 (Identification Risks Evaluation of Partially Synthetic Data with the $\texttt{IdentificationRiskCalculation}$ R Package)

We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius $r$ in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius $r$, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.

翻译：我们提出了评估部分合成数据中连续合成变数的识别风险的一般方法。我们采用半径美元来构建每个目标记录的识别风险概率,并用工作实例来说明一个或多个连续合成变数。我们展示了对消费者支出调查数据样本的应用方法,并讨论了对风险和数据效用的影响:(1) 半径的选择,(2) 合成变数的选择,(3) 合成数据集数量的选择。我们建议统计机构对连续变数的识别风险进行综合和评估。我们制作了一个R包,以实施我们提议的识别风险评估方法,并包括R样本脚本。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

250+阅读 · 2020年5月18日