While power systems research relies on the availability of real-world network datasets, data owners (e.g., system operators) are hesitant to share data due to security and privacy risks. To control these risks, we develop privacy-preserving algorithms for the synthetic generation of optimization and machine learning datasets. Taking a real-world dataset as input, the algorithms output its noisy, synthetic version, which preserves the accuracy of the real data on a specific downstream model or even a large population of those. We control the privacy loss using Laplace and Exponential mechanisms of differential privacy and preserve data accuracy using a post-processing convex optimization. We apply the algorithms to generate synthetic network parameters and wind power data.
翻译:电力系统的研究依赖于真实的网络数据集的可用性,但数据所有者(例如系统运营商)由于安全和隐私风险而不愿意共享数据。为了控制这些风险,我们开发了用于合成优化和机器学习数据集的保护隐私算法。将真实世界的数据集作为输入,算法输出其带有噪声的合成版本,该版本保留了在特定下游模型或甚至大量模型上真实数据的准确性。我们使用拉普拉斯和指数机制的差分隐私来控制隐私损失,并使用后处理的凸优化来保持数据的准确性。我们将算法应用于生成合成的网络参数和风能数据。