Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model. We use this to identify the strengths and limitations of these works and provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
翻译:以树为基础的模型是目前数据挖掘的最有效机械学习技术之一,因为它们的准确性、可解释性和简洁性。最近对更多数据和隐私保护的正统需求要求合作保护隐私的解决办法。在这项工作中,我们调查关于以树为基础的模型的分布式和隐私保护培训的文献,并根据四个轴系统整理其知识:学习算法、合作模型、保护机制和威胁模型。我们利用这个模型来查明这些工程的长处和局限性,并首次提供一个框架,分析分布式树为基础的模型学习中出现的信息渗漏。