Vertical federated learning is a trending solution for multi-party collaboration in training machine learning models. Industrial frameworks adopt secure multi-party computation methods such as homomorphic encryption to guarantee data security and privacy. However, a line of work has revealed that there are still leakage risks in VFL. The leakage is caused by the correlation between the intermediate representations and the raw data. Due to the powerful approximation ability of deep neural networks, an adversary can capture the correlation precisely and reconstruct the data. To deal with the threat of the data reconstruction attack, we propose a hashing-based VFL framework, called \textit{HashVFL}, to cut off the reversibility directly. The one-way nature of hashing allows our framework to block all attempts to recover data from hash codes. However, integrating hashing also brings some challenges, e.g., the loss of information. This paper proposes and addresses three challenges to integrating hashing: learnability, bit balance, and consistency. Experimental results demonstrate \textit{HashVFL}'s efficiency in keeping the main task's performance and defending against data reconstruction attacks. Furthermore, we also analyze its potential value in detecting abnormal inputs. In addition, we conduct extensive experiments to prove \textit{HashVFL}'s generalization in various settings. In summary, \textit{HashVFL} provides a new perspective on protecting multi-party's data security and privacy in VFL. We hope our study can attract more researchers to expand the application domains of \textit{HashVFL}.
翻译:垂直联系学习是培训机器学习模式中多方合作的趋势解决方案 。 工业框架采用安全的多方计算方法, 如单向加密, 以保障数据安全和隐私 。 然而, 一行工作揭示了VFL 中仍然存在泄漏风险 。 渗漏是由中间表达和原始数据之间的关联造成的 。 由于深层神经网络的强大近距离能力, 对手可以精确地捕捉相关关系并重建数据 。 为了应对数据重建攻击的威胁, 我们提议了一个基于 hashing的 VFL 框架, 称为\ textit{HashVFL}, 以直接切断数据可逆性。 单向的单向性质使得我们的框架能够阻止所有试图从 hash 代码中恢复数据的尝试。 然而, 整合也带来了一些挑战, 例如, 信息丢失。 本文提出并解决整合的三个挑战: 学习性、 点平衡和一致性 。 实验结果显示\ text{HashVVFL} 保持主任务性运行效率, 并捍卫数据快速化 。 我们还在常规实验中测试中 分析其潜在 。