Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today's era of data analysis, however, it poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? In the second case, even the observation made by the users on query results may be wrong. In the first case, can we still mine interesting explanations from the sensitive data while protecting its privacy? To address these challenges, we present a three-phase framework DPXPlain, which is the first system to the best of our knowledge for explaining group-by aggregate query answers with DP. In its three phases, DPXPlain (a) answers a group-by aggregate query with DP, (b) allows users to compare aggregate values of two groups and with high probability assesses whether this comparison holds or is flipped by the DP noise, and (c) eventually provides an explanation table containing the approximately `top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps. We perform an extensive experimental analysis of DPXPlain with multiple use-cases on real and synthetic data showing that DPXPlain efficiently provides insightful explanations with good accuracy and utility.
翻译:不同隐私(DP)是指在保存数据中敏感信息的隐私的同时,回答汇总数据库询问的隐私的最先进和严格的隐私概念;然而,在当今的数据分析时代,它给用户提供了新的挑战,使其了解查询结果中观察到的趋势和异常现象:数据本身造成的意外答案是数据本身造成的,还是必须增加额外噪音才能保存DP?在第二种情况下,甚至用户对查询结果的观察也可能是错误的。在第一种情况下,我们能否从敏感数据中找到有趣的解释,同时保护其隐私?为了应对这些挑战,我们提出了一个三阶段框架 DPXPlain,这是我们最了解如何向DP解释分组综合查询答案的第一个系统。在其三个阶段,DPXPlain (a) 回答与DP的分组询问,(b) 使用户能够比较两个组的总价值,并且极有可能评估这种比较是否被DP噪音所维持或翻转,以及(c) 最后提供一个解释表,其中含有大约`顶级P'的DPX框架,这是我们最能最好地解释对DP-X进行分组综合分析的第一个系统。