Generative models trained using Differential Privacy (DP) are increasingly used to produce and share synthetic data in a privacy-friendly manner. In this paper, we set out to analyze the impact of DP on these models vis-a-vis underrepresented classes and subgroups of data. We do so from two angles: 1) the size of classes and subgroups in the synthetic data, and 2) classification accuracy on them. We also evaluate the effect of various levels of imbalance and privacy budgets. Our experiments, conducted using three state-of-the-art DP models (PrivBayes, DP-WGAN, and PATE-GAN), show that DP results in opposite size distributions in the generated synthetic data. More precisely, it affects the gap between the majority and minority classes and subgroups, either reducing it (a "Robin Hood" effect) or increasing it ("Matthew" effect). However, both of these size shifts lead to similar disparate impacts on a classifier's accuracy, affecting disproportionately more the underrepresented subparts of the data. As a result, we call for caution when analyzing or training a model on synthetic data, or risk treating different subpopulations unevenly, which might also lead to unreliable conclusions.

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/

Federated Learning (FL) allows multiple participating clients to train machine learning models collaboratively by keeping their datasets local and only exchanging model updates. Existing FL protocol designs have been shown to be vulnerable to attacks that aim to compromise data privacy and/or model robustness. Recently proposed defenses focused on ensuring either privacy or robustness, but not both. In this paper, we develop a framework called PRECAD, which simultaneously achieves differential privacy (DP) and enhances robustness against model poisoning attacks with the help of cryptography. Using secure multi-party computation (MPC) techniques (e.g., secret sharing), noise is added to the model updates by the honest-but-curious server(s) (instead of each client) without revealing clients' inputs, which achieves the benefit of centralized DP in terms of providing a better privacy-utility tradeoff than local DP based solutions. Meanwhile, a crypto-aided secure validation protocol is designed to verify that the contribution of model update from each client is bounded without leaking privacy. We show analytically that the noise added to ensure DP also provides enhanced robustness against malicious model submissions. We experimentally demonstrate that our PRECAD framework achieves higher privacy-utility tradeoff and enhances robustness for the trained models.

0
0
下载
预览

Deep neural networks have been shown to be very powerful methods for many supervised learning tasks. However, they can also easily overfit to training set biases, i.e., label noise and class imbalance. While both learning with noisy labels and class-imbalanced learning have received tremendous attention, existing works mainly focus on one of these two training set biases. To fill the gap, we propose \textit{Prototypical Classifier}, which does not require fitting additional parameters given the embedding network. Unlike conventional classifiers that are biased towards head classes, Prototypical Classifier produces balanced and comparable predictions for all classes even though the training set is class-imbalanced. By leveraging this appealing property, we can easily detect noisy labels by thresholding the confidence scores predicted by Prototypical Classifier, where the threshold is dynamically adjusted through the iteration. A sample reweghting strategy is then applied to mitigate the influence of noisy labels. We test our method on CIFAR-10-LT, CIFAR-100-LT and Webvision datasets, observing that Prototypical Classifier obtains substaintial improvements compared with state of the arts.

0
0
下载
预览

From the distributional characterizations that lie at the heart of Stein's method we derive explicit formulae for the mass functions of discrete probability laws that identify those distributions. These identities are applied to develop tools for the solution of statistical problems. Our characterizations, and hence the applications built on them, do not require any knowledge about normalization constants of the probability laws. To demonstrate that our statistical methods are sound, we provide comparative simulation studies for the testing of fit to the Poisson distribution and for parameter estimation of the negative binomial family when both parameters are unknown. We also consider the problem of parameter estimation for discrete exponential-polynomial models which generally are non-normalized.

0
0
下载
预览

Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that are agnostic to discretization. To train our model, we use an adversarial approach with a discriminator that acts on continuous signals. Through experiments on a wide variety of data modalities including images, 3D shapes and climate data, we demonstrate that our model can learn rich distributions of functions independently of data type and resolution.

0
0
下载
预览

Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a simple correction to the particle weights. This correction utilizes the stop-gradient operator and does not modify the particle filter operation on the forward pass, while also being cheap and easy to compute. Surprisingly, with the same correction automatic differentiation also produces good estimators for gradients of expectations under the posterior. We can therefore regard our method as a general recipe for making particle filters differentiable. We additionally show that it produces desired estimators for second-order derivatives and how to extend it to further reduce variance at the expense of additional computation.

0
0
下载
预览

Differential Privacy (DP) has become a gold standard in privacy-preserving data analysis. While it provides one of the most rigorous notions of privacy, there are many settings where its applicability is limited. Our main contribution is in augmenting differential privacy with {\em Flexible Accuracy}, which allows small distortions in the input (e.g., dropping outliers) before measuring accuracy of the output, allowing one to extend DP mechanisms to high-sensitivity functions. We present mechanisms that can help in achieving this notion for functions that had no meaningful differentially private mechanisms previously. In particular, we illustrate an application to differentially private histograms, which in turn yields mechanisms for revealing the support of a dataset or the extremal values in the data. Analyses of our constructions exploit new versatile composition theorems that facilitate modular design. All the above extensions use our new definitional framework, which is in terms of "lossy Wasserstein distance" -- a 2-parameter error measure for distributions. This may be of independent interest.

0
0
下载
预览

Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold, and data distribution on that manifold. However, training samples are often obtained based on preferences, costs, or convenience producing artifacts in the empirical data distribution e.g., the large fraction of smiling faces in the CelebA dataset or the large fraction of dark-haired individuals in FFHQ. These inconsistencies will be reproduced when sampling from the trained DGN, which has far-reaching potential implications for fairness, data augmentation, anomaly detection, domain adaptation, and beyond. In response, we develop a differential geometry based sampler -- coined MaGNET -- that, given any trained DGN, produces samples that are uniformly distributed on the learned manifold. We prove theoretically and empirically that our technique produces a uniform distribution on the manifold regardless of the training set distribution. We perform a range of experiments on various datasets and DGNs. One of them considers the state-of-the-art StyleGAN2 trained on FFHQ dataset, where uniform sampling via MaGNET increases distribution precision and recall by 4.1% & 3.0% and decreases gender bias by 41.2%, without requiring labels or retraining.

0
0
下载
预览

Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent study, Ekstrand et al. investigate how recommender performance varies according to popularity and demographics, and find statistically significant differences in recommendation utility between binary genders on two datasets, and significant effects based on age on one dataset. Here we reproduce those results and extend them with additional analyses. We find statistically significant differences in recommender performance by both age and gender. We observe that recommendation utility steadily degrades for older users, and is lower for women than men. We also find that the utility is higher for users from countries with more representation in the dataset. In addition, we find that total usage and the popularity of consumed content are strong predictors of recommender performance and also vary significantly across demographic groups.

0
0
下载
预览

Train machine learning models on sensitive user data has raised increasing privacy concerns in many areas. Federated learning is a popular approach for privacy protection that collects the local gradient information instead of real data. One way to achieve a strict privacy guarantee is to apply local differential privacy into federated learning. However, previous works do not give a practical solution due to three issues. First, the noisy data is close to its original value with high probability, increasing the risk of information exposure. Second, a large variance is introduced to the estimated average, causing poor accuracy. Last, the privacy budget explodes due to the high dimensionality of weights in deep learning models. In this paper, we proposed a novel design of local differential privacy mechanism for federated learning to address the abovementioned issues. It is capable of making the data more distinct from its original value and introducing lower variance. Moreover, the proposed mechanism bypasses the curse of dimensionality by splitting and shuffling model updates. A series of empirical evaluations on three commonly used datasets, MNIST, Fashion-MNIST and CIFAR-10, demonstrate that our solution can not only achieve superior deep learning performance but also provide a strong privacy guarantee at the same time.

0
4
下载
预览

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

0
4
下载
预览
小贴士
相关论文
Tong Wei,Jiang-Xin Shi,Yu-Feng Li,Min-Ling Zhang
0+阅读 · 10月22日
Emilien Dupont,Yee Whye Teh,Arnaud Doucet
0+阅读 · 10月19日
Adam Ścibior,Frank Wood
0+阅读 · 10月19日
Aman Bansal,Rahul Chunduru,Deepesh Data,Manoj Prabhakaran
0+阅读 · 10月18日
Ahmed Imtiaz Humayun,Randall Balestriero,Richard Baraniuk
0+阅读 · 10月18日
Nicola Neophytou,Bhaskar Mitra,Catherine Stinson
0+阅读 · 10月15日
Lichao Sun,Jianwei Qian,Xun Chen,Philip S. Yu
4+阅读 · 2020年7月31日
Federico Camerlenghi,David B. Dunson,Antonio Lijoi,Igor Prünster,Abel Rodríguez
4+阅读 · 2018年1月15日
相关VIP内容
专知会员服务
80+阅读 · 4月17日
专知会员服务
30+阅读 · 2020年7月27日
专知会员服务
58+阅读 · 2020年5月3日
因果图,Causal Graphs,52页ppt
专知会员服务
144+阅读 · 2020年4月19日
专知会员服务
25+阅读 · 2020年4月1日
强化学习最新教程,17页pdf
专知会员服务
69+阅读 · 2019年10月11日
相关资讯
分布式并行架构Ray介绍
CreateAMind
6+阅读 · 2019年8月9日
Hierarchically Structured Meta-learning
CreateAMind
12+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
8+阅读 · 2019年5月18日
计算机 | USENIX Security 2020等国际会议信息5条
Call4Papers
4+阅读 · 2019年4月25日
动物脑的好奇心和强化学习的好奇心
CreateAMind
6+阅读 · 2019年1月26日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
Disentangled的假设的探讨
CreateAMind
7+阅读 · 2018年12月10日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
条件GAN重大改进!cGANs with Projection Discriminator
CreateAMind
6+阅读 · 2018年2月7日
Top