We propose a new differentially-private decision forest algorithm that minimizes both the number of queries required, and the sensitivity of those queries. To do so, we build an ensemble of random decision trees that avoids querying the private data except to find the majority class label in the leaf nodes. Rather than using a count query to return the class counts like the current state-of-the-art, we use the Exponential Mechanism to only output the class label itself. This drastically reduces the sensitivity of the query -- often by several orders of magnitude -- which in turn reduces the amount of noise that must be added to preserve privacy. Our improved sensitivity is achieved by using "smooth sensitivity", which takes into account the specific data used in the query rather than assuming the worst-case scenario. We also extend work done on the optimal depth of random decision trees to handle continuous features, not just discrete features. This, along with several other improvements, allows us to create a differentially private decision forest with substantially higher predictive power than the current state-of-the-art.
翻译:我们提出一种新的差异-私人决定森林算法,将需要查询的次数和这些查询的敏感度降低到最低程度。为此,我们建立一个随机决定树的组合,避免查询私人数据,但在叶节点中找到多数类标签。我们不使用计数查询来返回类数,比如目前最先进的分类,而是使用授标机制来输出等级标签本身。这大大降低了查询的敏感度 -- -- 往往以几个数量级来减少 -- -- 这反过来又减少了为维护隐私而必须增加的噪音的数量。我们提高的敏感性是通过使用“偏移敏感度”实现的,该敏感度是考虑到查询中使用的具体数据,而不是假设最坏的情况。我们还扩大了随机决定树的最佳深度,以便处理连续的特性,而不仅仅是离散特性。这加上其他一些改进,使我们能够创造出一种差别化的私人决定森林,其预测力大大高于目前的状态。