The increasing concerns about data privacy and security drive an emerging field of studying privacy-preserving machine learning from isolated data sources, i.e., federated learning. A class of federated learning, vertical federated learning, where different parties hold different features for common users, has a great potential of driving a great variety of business cooperation among enterprises in many fields. In machine learning, decision tree ensembles such as gradient boosting decision trees (GBDT) and random forest are widely applied powerful models with high interpretability and modeling efficiency. However, stateof-art vertical federated learning frameworks adapt anonymous features to avoid possible data breaches, makes the interpretability of the model compromised. To address this issue in the inference process, in this paper, we firstly make a problem analysis about the necessity of disclosure meanings of feature to Guest Party in vertical federated learning. Then we find the prediction result of a tree could be expressed as the intersection of results of sub-models of the tree held by all parties. With this key observation, we protect data privacy and allow the disclosure of feature meaning by concealing decision paths and adapt a communication-efficient secure computation method for inference outputs. The advantages of Fed-EINI will be demonstrated through both theoretical analysis and extensive numerical results. We improve the interpretability of the model by disclosing the meaning of features while ensuring efficiency and accuracy.
翻译:由于对数据隐私和安全的日益关切,正在研究从孤立的数据来源(即联合学习)中学习隐私保存机器,这是一个新兴领域,从孤立的数据来源(即联合学习)中学习隐私保存机器,这是一个新兴领域。 一组联合学习、纵向联合学习、不同方面对共同用户具有不同特点的一类联合学习,极有可能推动许多领域的企业开展多种多样的商业合作。在机器学习中,诸如梯度提振决策树和随机森林等决策树群被广泛应用为强有力的模型,具有较高的可解释性和建模效率。然而,最新水平的纵向联合学习框架调整匿名特征,以避免可能出现的数据破损,使模型的可解释性受到损害。为了在本文中探讨这一问题,我们首先对在纵向联合学习中向热心党披露特征的必要性进行问题分析。然后,我们发现树的预测结果可以被表述为所有各方持有的亚型模型结果的交汇点。通过这一关键观察,我们保护数据隐私,允许通过隐藏决定路径和调整通信准确性来披露特征,从而改变模型的可解释性特征。在本文中,我们通过对通信效率的准确性分析中展示的准确性分析方法,将改进。