Empirical economic research crucially relies on highly sensitive individual datasets. At the same time, increasing availability of public individual-level data makes it possible for adversaries to potentially de-identify anonymized records in sensitive research datasets. Most commonly accepted formal definition of an individual non-disclosure guarantee is referred to as differential privacy. It restricts the interaction of researchers with the data by allowing them to issue queries to the data. The differential privacy mechanism then replaces the actual outcome of the query with a randomised outcome. The impact of differential privacy on the identification of empirical economic models and on the performance of estimators in nonlinear empirical Econometric models has not been sufficiently studied. Since privacy protection mechanisms are inherently finite-sample procedures, we define the notion of identifiability of the parameter of interest under differential privacy as a property of the limit of experiments. It is naturally characterized by the concepts from the random sets theory. We show that particular instances of regression discontinuity design may be problematic for inference with differential privacy as parameters turn out to be neither point nor partially identified. The set of differentially private estimators converges weakly to a random set. Our analysis suggests that many other estimators that rely on nuisance parameters may have similar properties with the requirement of differential privacy. We show that identification becomes possible if the target parameter can be deterministically located within the random set. In that case, a full exploration of the random set of the weak limits of differentially private estimators can allow the data curator to select a sequence of instances of differentially private estimators converging to the target parameter in probability.
翻译:个人经验性的经济研究主要依赖于高度敏感的个人数据集。 同时,公共个人层面数据的提供量的增加使得对手有可能在敏感研究数据集中可能不识别匿名记录。最普遍接受的关于个人不披露保证的正式定义被称为差异隐私。它限制研究人员与数据的互动,允许他们对数据进行查询。不同的隐私机制随后以随机结果取代查询的实际结果。差异性隐私对确定经验性经济模型和对非线性实证性经济计量模型中估算员的性能的影响没有得到充分的研究。由于隐私保护机制本质上是有限抽样程序,我们定义了差异性隐私下利息参数的可识别性概念,作为实验范围的一种属性。它自然地以随机设置的理论概念为特征。我们表明,某些回归性设计不连贯性的情况可能会有问题,随着参数的出现而导致差异性隐私性选择,选择性经济模型和非线性经济计量模型中的估算员的性能没有得到充分研究。由于隐私保护机制本身的概率差异性参数是有限的,因此,我们所设定的精确性参数的精确性参数可以显示,而我们所设定的精确性参数的精确性标定的精确度可能显示。