启用基于SQL的基于SQL的培训数据调试功能,用于联邦学习 (Enabling SQL-based Training Data Debugging for Federated Learning)

How can we debug a logistical regression model in a federated learning setting when seeing the model behave unexpectedly (e.g., the model rejects all high-income customers' loan applications)? The SQL-based training data debugging framework has proved effective to fix this kind of issue in a non-federated learning setting. Given an unexpected query result over model predictions, this framework automatically removes the label errors from training data such that the unexpected behavior disappears in the retrained model. In this paper, we enable this powerful framework for federated learning. The key challenge is how to develop a security protocol for federated debugging which is proved to be secure, efficient, and accurate. Achieving this goal requires us to investigate how to seamlessly integrate the techniques from multiple fields (Databases, Machine Learning, and Cybersecurity). We first propose FedRain, which extends Rain, the state-of-the-art SQL-based training data debugging framework, to our federated learning setting. We address several technical challenges to make FedRain work and analyze its security guarantee and time complexity. The analysis results show that FedRain falls short in terms of both efficiency and security. To overcome these limitations, we redesign our security protocol and propose Frog, a novel SQL-based training data debugging framework tailored for federated learning. Our theoretical analysis shows that Frog is more secure, more accurate, and more efficient than FedRain. We conduct extensive experiments using several real-world datasets and a case study. The experimental results are consistent with our theoretical analysis and validate the effectiveness of Frog in practice.

翻译：在联合学习环境中,当看到模型出乎意料地表现时,我们如何在联合学习环境中调试后勤回归模型?基于SQL的培训数据调试框架已证明在非联合学习环境中能够有效地解决这类问题。鉴于模型预测的意外查询结果,这个框架自动从培训数据中消除标签错误,使重新培训模式中的意外行为消失。在本文中,我们使这个强大的框架能够用于联合学习。关键的挑战是如何为联合调试制定安全性协议,证明这种协议安全性、高效性和准确性。实现这一目标需要我们研究如何在非联合学习环境中无缝地将多种技术(数据库、机器学习和网络安全性)整合起来。我们首先提议FDRain,它延长雨率,即以标准为基础的培训数据调试框架,比以再培训模式为基础的培训数据调试框架更加安全性。我们应对了几项技术挑战,使FDRain的工作得以进行准确性调试,并分析了其安全性保证和时间复杂性。分析结果显示,FRDL更精确性地展示了我们的安全性安全性分析方法。