Fed-EINI: 联邦学习中决策树群的高效和可解释的推论框架 (Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning)

from arxiv, 10 pages, 8 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

The increasing concerns about data privacy and security drive an emerging field of studying privacy-preserving machine learning from isolated data sources, i.e., federated learning. A class of federated learning, \textit{vertical federated learning}, where different parties hold different features for common users, has a great potential of driving a great variety of business cooperation among enterprises in many fields. In machine learning, decision tree ensembles such as gradient boosting decision trees (GBDT) and random forest are widely applied powerful models with high interpretability and modeling efficiency. However, state-of-art vertical federated learning frameworks adapt anonymous features to avoid possible data breaches, makes the interpretability of the model compromised. To address this issue in the inference process, in this paper, we firstly make a problem analysis about the necessity of disclosure meanings of feature to Guest Party in vertical federated learning. Then we find the prediction result of a tree could be expressed as the intersection of results of sub-models of the tree held by all parties. With this key observation, we protect data privacy and allow the disclosure of feature meaning by concealing decision paths and adapt a communication-efficient secure computation method for inference outputs. The advantages of Fed-EINI will be demonstrated through both theoretical analysis and extensive numerical results. We improve the interpretability of the model by disclosing the meaning of features while ensuring efficiency and accuracy.

翻译：对数据隐私和安全的日益关切促使人们开始研究从孤立的数据来源,即联合学习,学习隐私保存机器,从孤立的数据来源,即联合学习; 一类联结学习,即\textit{纵向联合学习},不同当事方对共同用户具有不同的特征,因此极有可能推动企业在许多领域开展多种多样的商业合作; 在机器学习中,决策树集合,如梯度增强决策树和随机森林等,被广泛采用强有力的模型,具有较高的可解释性和建模效率; 然而,最先进的纵向联结学习框架调整匿名特征,以避免可能出现的数据破损,使模型的可解释性受到损害。为了在推断过程中解决这一问题,我们首先对向纵向联合学习中的来宾方披露特征的必要性进行问题分析。之后,我们发现树的预测结果可以表现为所有各方持有的亚型模型结果的交叉点。然而,通过这一关键观察,我们保护数据隐私,允许披露模型的匿名特征,以避免可能出现的数据破损,从而损害模型的可解释性。在本文中,为了在推断过程中解决这一问题,我们首先分析是否有必要向来披露特征。