Mining health data can lead to faster medical decisions, improvement in the quality of treatment, disease prevention, reduced cost, and it drives innovative solutions within the healthcare sector. However, health data is highly sensitive and subject to regulations such as the General Data Protection Regulation (GDPR), which aims to ensure patient's privacy. Anonymization or removal of patient identifiable information, though the most conventional way, is the first important step to adhere to the regulations and incorporate privacy concerns. In this paper, we review the existing anonymization techniques and their applicability to various types (relational and graph-based) of health data. Besides, we provide an overview of possible attacks on anonymized data. We illustrate via a reconstruction attack that anonymization though necessary, is not sufficient to address patient privacy and discuss methods for protecting against such attacks. Finally, we discuss tools that can be used to achieve anonymization.
翻译:采矿卫生数据可以导致更快的医疗决定、改善治疗质量、疾病预防、降低成本,并促使保健部门采取创新解决办法;然而,卫生数据高度敏感,并受《一般数据保护条例》等条例的制约,该条例旨在确保病人的隐私;匿名或删除病人可识别信息,尽管是最传统的方式,是遵守条例和纳入隐私关切的第一个重要步骤;在本文件中,我们审查了现有的匿名技术及其适用于各类(以关系和图表为基础的)卫生数据的适用性;此外,我们概述了匿名数据可能受到的攻击;我们通过重建攻击说明,匿名虽然必要,但不足以解决病人隐私问题,并讨论防止此类攻击的方法;最后,我们讨论了可用于实现匿名的工具。