There is a constant trade-off between the utility of the data collected and processed by the many systems forming the Internet of Things (IoT) revolution and the privacy concerns of the users living in the spaces hosting these sensors. Privacy models, such as the SITA (Spatial, Identity, Temporal, and Activity) model, can help address this trade-off. In this paper, we focus on the problem of $CO_2$ prediction, which is crucial for health monitoring but can be used to monitor occupancy, which might reveal some private information. We apply a number of transformations on a real dataset from a Smart Building to simulate different SITA configurations on the collected data. We use the transformed data with multiple Machine Learning (ML) techniques to analyse the performance of the models to predict $CO_{2}$ levels. Our results show that, for different algorithms, different SITA configurations do not make one algorithm perform better or worse than others, compared to the baseline data; also, in our experiments, the temporal dimension was particularly sensitive, with scores decreasing up to $18.9\%$ between the original and the transformed data. The results can be useful to show the effect of different levels of data privacy on the data utility of IoT applications, and can also help to identify which parameters are more relevant for those systems so that higher privacy settings can be adopted while data utility is still preserved.
翻译:暂无翻译