Across academia, government, and industry, data stewards are facing increasing pressure to make datasets more openly accessible for researchers while also protecting the privacy of data subjects. Differential privacy (DP) is one promising way to offer privacy along with open access, but further inquiry is needed into the tensions between DP and data science. In this study, we conduct interviews with 19 data practitioners who are non-experts in DP as they use a DP data analysis prototype to release privacy-preserving statistics about sensitive data, in order to understand perceptions, challenges, and opportunities around using DP. We find that while DP is promising for providing wider access to sensitive datasets, it also introduces challenges into every stage of the data science workflow. We identify ethics and governance questions that arise when socializing data scientists around new privacy constraints and offer suggestions to better integrate DP and data science.
翻译:在整个学术界、政府和工业界,数据管理员面临越来越大的压力,要让研究人员更公开地获得数据集,同时保护数据主题的隐私。不同隐私(DP)是提供隐私和开放访问的有希望的方法之一,但还需要进一步调查DP与数据科学之间的紧张关系。在这项研究中,我们与19名非DP专家的数据从业者进行了访谈,因为他们使用DP数据分析原型,发布敏感数据的隐私保护统计数据,以便了解关于使用DP的认知、挑战和机遇。我们发现,虽然DP有希望提供更广泛的敏感数据集访问,但它也给数据科学工作流程的每个阶段带来了挑战。我们确定了在围绕新的隐私限制使数据科学家社会化时出现的伦理和治理问题,并提出了更好地整合DP和数据科学的建议。