Aggregate time-series data like traffic flow and site occupancy repeatedly sample statistics from a population across time. Such data can be profoundly useful for understanding trends within a given population, but also pose a significant privacy risk, potentially revealing e.g., who spends time where. Producing a private version of a time-series satisfying the standard definition of Differential Privacy (DP) is challenging due to the large influence a single participant can have on the sequence: if an individual can contribute to each time step, the amount of additive noise needed to satisfy privacy increases linearly with the number of time steps sampled. As such, if a signal spans a long duration or is oversampled, an excessive amount of noise must be added, drowning out underlying trends. However, in many applications an individual realistically cannot participate at every time step. When this is the case, we observe that the influence of a single participant (sensitivity) can be reduced by subsampling and/or filtering in time, while still meeting privacy requirements. Using a novel analysis, we show this significant reduction in sensitivity and propose a corresponding class of privacy mechanisms. We demonstrate the utility benefits of these techniques empirically with real-world and synthetic time-series data.

### 相关内容

The availability of genomic data is essential to progress in biomedical research, personalized medicine, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. As a result, several initiatives have been launched to experiment with synthetic genomic data, e.g., using generative models to learn the underlying distribution of the real data and generate artificial datasets that preserve its salient characteristics without exposing it. This paper provides the first evaluation of both utility and privacy protection of six state-of-the-art models for generating synthetic genomic data. We assess the performance of the synthetic data on several common tasks, such as allele population statistics and linkage disequilibrium. We then measure privacy through the lens of membership inference attacks, i.e., inferring whether a record was part of the training data. Our experiments show that no single approach to generate synthetic genomic data yields both high utility and strong privacy across the board. Also, the size and nature of the training dataset matter. Moreover, while some combinations of datasets and models produce synthetic data with distributions close to the real data, there often are target data points that are vulnerable to membership inference. Looking forward, our techniques can be used by practitioners to assess the risks of deploying synthetic genomic data in the wild and serve as a benchmark for future work.

If devices are physically accessible optical fault injection attacks pose a great threat since the data processed as well as the operation flow can be manipulated. Successful physical attacks may lead not only to leakage of secret information such as cryptographic private keys, but can also cause economic damage especially if as a result of such a manipulation a critical infrastructure is successfully attacked. Laser based attacks exploit the sensitivity of CMOS technologies to electromagnetic radiation in the visible or the infrared spectrum. It can be expected that radiation-hard designs, specially crafted for space applications, are more robust not only against high-energy particles and short electromagnetic waves but also against optical fault injection attacks. In this work we investigated the sensitivity of radiation-hard JICG shift registers to optical fault injection attacks. In our experiments, we were able to trigger bit-set and bit-reset repeatedly changing the data stored in single JICG flip-flops despite their high-radiation fault tolerance.

Without a specific functional context, non-functional requirements can only be approached as cross-cutting concerns and treated uniformly across all features of an application. This neglects, however, the heterogeneity of non-functional requirements that arises from stakeholder interests and the distinct functional scopes of software systems, which mutually influence how these non-functional requirements have to be satisfied. Earlier studies showed that the different types and objectives of non-functional requirements result in either vague or unbalanced specification of non-functional requirements. We propose a task analytic approach for eliciting and modeling user tasks to approach the stakeholders' pursued interests towards the software product. Stakeholder interests are structurally related to user tasks and each interest can be specified individually as a constraint of a specific user task. These constraints support DevOps teams with important guidance on how the interest of the stakeholder can be satisfied in the software lifecycle sufficiently. We propose a structured approach, intertwining task-oriented functional requirements with non-functional stakeholder interests to specify constraints on the level of user tasks. We also present results of a case study with domain experts, which reveals that our task modeling and interest-tailoring method increases the comprehensibility of non-functional requirements as well as their impact on the functional requirements, i.e., the users' tasks.

Probabilistic counters are well known tools often used for space-efficient set cardinality estimation. In this paper we investigate probabilistic counters from the perspective of preserving privacy. We use standard, rigid differential privacy notion. The intuition is that the probabilistic counters do not reveal too much information about individuals, but provide only general information about the population. Thus they can be used safely without violating privacy of individuals. It turned out however that providing a precise, formal analysis of privacy parameters of probabilistic counters is surprisingly difficult and needs advanced techniques and a very careful approach. We demonstrate also that probabilistic counters can be used as a privacy protecion mechanism without any extra randomization. That is, the inherit randomization from the protocol is sufficient for protecting privacy, even if the probabilistic counter is used many times. In particular we present a specific privacy-preserving data aggregation protocol based on a probabilistic counter. Our results can be used for example in performing distributed surveys.

Machine learning is increasingly used in the most diverse applications and domains, whether in healthcare, to predict pathologies, or in the financial sector to detect fraud. One of the linchpins for efficiency and accuracy in machine learning is data utility. However, when it contains personal information, full access may be restricted due to laws and regulations aiming to protect individuals' privacy. Therefore, data owners must ensure that any data shared guarantees such privacy. Removal or transformation of private information (de-identification) are among the most common techniques. Intuitively, one can anticipate that reducing detail or distorting information would result in losses for model predictive performance. However, previous work concerning classification tasks using de-identified data generally demonstrates that predictive performance can be preserved in specific applications. In this paper, we aim to evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks. We leverage a large set of privacy-preserving techniques and learning algorithms to provide an assessment of re-identification ability and the impact of transformed variants on predictive performance. Unlike previous literature, we confirm that the higher the level of privacy (lower re-identification risk), the higher the impact on predictive performance, pointing towards clear evidence of a trade-off.

Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.

Mitigating bias in machine learning systems requires refining our understanding of bias propagation pathways: from societal structures to large-scale data to trained models to impact on society. In this work, we focus on one aspect of the problem, namely bias amplification: the tendency of models to amplify the biases present in the data they are trained on. A metric for measuring bias amplification was introduced in the seminal work by Zhao et al. (2017); however, as we demonstrate, this metric suffers from a number of shortcomings including conflating different types of bias amplification and failing to account for varying base rates of protected classes. We introduce and analyze a new, decoupled metric for measuring bias amplification, $\text{BiasAmp}_{\rightarrow}$ (Directional Bias Amplification). We thoroughly analyze and discuss both the technical assumptions and the normative implications of this metric. We provide suggestions about its measurement by cautioning against predicting sensitive attributes, encouraging the use of confidence intervals due to fluctuations in the fairness of models across runs, and discussing the limitations of what this metric captures. Throughout this paper, we work to provide an interrogative look at the technical measurement of bias amplification, guided by our normative ideas of what we want it to encompass.

We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available. Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities, e.g., identical conditionals or small distributional discrepancies. However, these assumptions may preclude the possibility of adaptation from intricately shifted and apparently very different distributions. To overcome this problem, we propose mechanism transfer, a meta-distributional scenario in which a data generating mechanism is invariant among domains. This transfer assumption can accommodate nonparametric shifts resulting in apparently different distributions while providing a solid statistical basis for DA. We take the structural equations in causal modeling as an example and propose a novel DA method, which is shown to be useful both theoretically and experimentally. Our method can be seen as the first attempt to fully leverage the structural causal models for DA.

Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.

Machine Learning is a widely-used method for prediction generation. These predictions are more accurate when the model is trained on a larger dataset. On the other hand, the data is usually divided amongst different entities. For privacy reasons, the training can be done locally and then the model can be safely aggregated amongst the participants. However, if there are only two participants in \textit{Collaborative Learning}, the safe aggregation loses its power since the output of the training already contains much information about the participants. To resolve this issue, they must employ privacy-preserving mechanisms, which inevitably affect the accuracy of the model. In this paper, we model the training process as a two-player game where each player aims to achieve a higher accuracy while preserving its privacy. We introduce the notion of \textit{Price of Privacy}, a novel approach to measure the effect of privacy protection on the accuracy of the model. We develop a theoretical model for different player types, and we either find or prove the existence of a Nash Equilibrium with some assumptions. Moreover, we confirm these assumptions via a Recommendation Systems use case: for a specific learning algorithm, we apply three privacy-preserving mechanisms on two real-world datasets. Finally, as a complementary work for the designed game, we interpolate the relationship between privacy and accuracy for this use case and present three other methods to approximate it in a real-world scenario.

Bristena Oprisanu,Georgi Ganev,Emiliano De Cristofaro
0+阅读 · 1月18日
Dmytro Petryk,Zoya Dyka,Roland Sorge,Jan Schaeffner,Peter Langendoerfer
0+阅读 · 1月18日
Philipp Haindl,Reinhold Plösch
0+阅读 · 1月17日
Dominik Bojko,Krzysztof Grining,Marek Klonowski
0+阅读 · 1月14日
Tânia Carvalho,Nuno Moniz,Pedro Faria,Luís Antunes
0+阅读 · 1月13日
Dripta S. Raychaudhuri,Sujoy Paul,Jeroen van Baar,Amit K. Roy-Chowdhury
6+阅读 · 2021年5月20日
Angelina Wang,Olga Russakovsky
3+阅读 · 2021年2月24日
Takeshi Teshima,Issei Sato,Masashi Sugiyama
3+阅读 · 2020年8月19日
Zonghao Huang,Rui Hu,Yuanxiong Guo,Eric Chan-Tin,Yanmin Gong
3+阅读 · 2019年3月25日
Balazs Pejo,Qiang Tang
4+阅读 · 2018年2月28日

Call4Papers
7+阅读 · 2019年5月27日
CreateAMind
12+阅读 · 2019年5月24日
CreateAMind
13+阅读 · 2019年5月22日
CreateAMind
8+阅读 · 2019年5月18日
Call4Papers
6+阅读 · 2019年3月21日
Call4Papers
5+阅读 · 2019年1月10日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
32+阅读 · 2019年1月3日
CreateAMind
9+阅读 · 2019年1月2日
Call4Papers
4+阅读 · 2017年6月29日
Top