This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algorithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace mechanism. Our second method LocalRNM significantly reduces communication and added noise by performing local optimization at each data holder. We provide the first utility guarantees for differentially private top-down decision tree learning in both the single machine and distributed settings. These guarantees show that the error of the privately-learned decision tree quickly goes to zero provided that the dataset is sufficiently large. Our extensive experiments on real datasets illustrate the trade-offs of privacy, accuracy and generalization when learning private decision trees in the distributed setting.
翻译:本文介绍了在分布式设置中进行有区别的私人自上而下的自上而下决策树学习的第一种精确算法(Balcan等人,2012年)。我们提议DP-TopDown,这是保护决策树学习的一般隐私算法,并提出了两种分布式实施法。我们的第一种方法NoisyCounts自然地通过使用 Laplace 机制扩展了单一机器算法。我们的第二种方法LocalRNM通过对每个数据持有者进行本地优化,大大减少了通信和增加噪音。我们为在单一机器和分布式设置中进行有区别的私人自上而下决策树学习提供了第一种实用保证。这些保证表明,只要数据集足够大,私有决策树的错误就会迅速变为零。我们在真实数据集上的广泛实验表明,在分布式设置中学习私人决策树时,隐私、准确性和普遍性的权衡取舍。