Forecasting could be negatively impacted due to anonymization requirements in data protection legislation. To measure the potential severity of this problem, we derive theoretical bounds for the loss to forecasts from additive exponential smoothing models using protected data. Following the guidelines of anonymization from the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), we develop the $k$-nearest Time Series ($k$-nTS) Swapping and $k$-means Time Series ($k$-mTS) Shuffling methods to create protected time series data that minimizes the loss to forecasts while preventing a data intruder from detecting privacy issues. For efficient and effective decision making, we formally model an integer programming problem for a perfect matching for simultaneous data swapping in each cluster. We call it a two-party data privacy framework since our optimization model includes the utilities of a data provider and data intruder. We apply our data protection methods to thousands of time series and find that it maintains the forecasts and patterns (level, trend, and seasonality) of time series well compared to standard data protection methods suggested in legislation. Substantively, our paper addresses the challenge of protecting time series data when used for forecasting. Our findings suggest the managerial importance of incorporating the concerns of forecasters into the data protection itself.
翻译:由于数据保护立法中的匿名要求,预测可能会受到负面影响。为了衡量这一问题的潜在严重性,我们从使用受保护数据的添加指数式平滑模型得出的损失理论到预测的预测。根据《一般数据保护条例》(GDP)和《加利福尼亚消费者隐私法》(CCPA)的匿名准则,我们开发了美元-美元-美元-美元-美元-美元-美元-美元-美元时间序列(k$-mTS)交换和美元-美元-美元-时间序列(美元-mTS),以创建受保护的时间序列数据,将损失降至预测,同时防止数据侵入者发现隐私问题。为了高效率和有效的决策,我们正式将一个整成的编程问题模型,以完美匹配每个组同时转换的数据。我们称之为双方数据隐私框架,因为我们的优化模型包括数据提供者和数据侵入者的公用设施。我们用数据保护方法适用于数千个时间序列,发现与立法中建议的标准数据保护方法相比,它保持时间序列的预测和模式(水平、趋势和季节性)。实质性地说,我们的文件提出了在使用数据预测时如何保护数据时保护自己时,需要。</s>