Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore how data augmentation can mitigate Membership Inference Attacks (MIA) on TSF models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based MIAs by reducing the attacker's true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model's ability to generalize to unseen data. We examine multiple augmentation strategies - Zeroth-Order Optimization (ZOO), a variant of ZOO constrained by Principal Component Analysis (ZOO-PCA), and MixUp - to strengthen model resilience without sacrificing accuracy. Our experimental results show that ZOO-PCA yields the best reductions in TPR/FPR ratio for MIA attacks without sacrificing performance on test data.
翻译:在涉及电子健康记录(EHR)的时间序列预测(TSF)任务中,平衡强隐私保障与高预测性能至关重要。本研究探讨了数据增强如何缓解针对TSF模型的成员推理攻击(MIA)。我们证明,使用合成数据重新训练可显著降低基于损失的MIA的有效性,通过降低攻击者的真阳性与假阳性比率。关键挑战在于生成与原始训练数据高度相似的合成样本以混淆攻击者,同时引入足够的新颖性以增强模型对未见数据的泛化能力。我们研究了多种增强策略——零阶优化(ZOO)、基于主成分分析约束的ZOO变体(ZOO-PCA)以及MixUp——以在不牺牲准确性的前提下提升模型鲁棒性。实验结果表明,ZOO-PCA在保持测试数据性能的同时,对MIA攻击的TPR/FPR比率实现了最佳降低效果。