垂直联式 XGBoost 的高效批量同质加密 (Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost)

More and more orgainizations and institutions make efforts on using external data to improve the performance of AI services. To address the data privacy and security concerns, federated learning has attracted increasing attention from both academia and industry to securely construct AI models across multiple isolated data providers. In this paper, we studied the efficiency problem of adapting widely used XGBoost model in real-world applications to vertical federated learning setting. State-of-the-art vertical federated XGBoost frameworks requires large number of encryption operations and ciphertext transmissions, which makes the model training much less efficient than training XGBoost models locally. To bridge this gap, we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. We are motivated by the batch idea in the work of BatchCrypt for horizontal federated learning, and design a novel batch method to address the limitations of allowing quite few number of negative numbers. The encode procedure of the proposed batch method consists of four steps, including shifting, truncating, quantizing and batching, while the decoding procedure consists of de-quantization and shifting back. The advantages of our method are demonstrated through theoretical analysis and extensive numerical experiments.

翻译：为了解决数据隐私和安全方面的关注,联盟学习已引起学术界和工业界日益重视,以便在多个孤立的数据提供者中安全地建立AI模型。在本文中,我们研究了将现实世界应用中广泛使用的XGBoost模型编码成垂直联合学习设置; 国家垂直联合化的XGBoost框架需要大量的加密操作和密码传输,这使得模型培训比当地培训XGBoost模型的效率要低得多。为了缩小这一差距,我们建议了一种新型的同系加密方法,以减少加密相关计算和传输的成本。在本文中,我们研究了在现实世界应用中将广泛使用的XGBoost模型编码成一个编码、密码传输和同系添加操作的单一数字的效率问题。多种一级衍生品和第二级衍生品的总和可以从编码值总和中解码。我们的动机是分批式的计算方法,包括分级法的分级法和分级法的分级法分析。分级法的分级法是分级法的分级法,即分级法的分级法,分级法是分级法的分级法,分级法式法的分级法,分级法是分序法式法式法,分级法式的分序法,分序法是分序法。