Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals' privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.
翻译:包含个人信息的开放数据集即使匿名也容易受到对抗性攻击; 公开数据门户的恶意用户通过在具有共同属性的多个数据集上进行低成本结合,可能会获得侵犯个人隐私的信息; 然而,开放数据集主要使用发布和折叠模式发布,数据所有人和保管人很少或完全不了解这些隐私风险; 我们通过开发视觉分析解决方案,使数据维护者能够了解当地可合并数据区存在的披露风险,解决这一关键差距; 通过与数据隐私研究人员进行设计研究,从而产生解决方案,我们最初发挥红色团队的作用,并参与基于隐私攻击情景的道德数据黑客活动; 我们利用这一问题和域特征特征来开发一套视觉分析干预措施,作为保护隐私的机制,并在生命中认识这些风险; 我们利用视觉风险检查工作流程,作为数据维护者的预防性监测工具; PiveE使用风险分数和相关的互动可视化组合,让数据维护者在数据隐私质变中进行多个层次的匹配和解释风险; 我们利用PRIVEE数据披露模型,通过对数据风险进行两次风险分析,我们通过PRIVEE数据库的案例研究来进行模拟风险分析。