The General Data Protection Regulation (GDPR) grants all natural persons the right of access to their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of citizens' digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed de-identification software that is able to handle typical characteristics of DDPs such as regularly changing file structures, visual and textual content, different file formats, different file structures and accounting for usernames. We investigate the performance of the software and illustrate how the software can be tailored towards specific DDP structures.
翻译:《一般数据保护条例》规定,如果数据控制员正在处理个人数据,所有自然人都有权查阅个人数据;数据控制员有义务以电子格式分享数据,并经常以所谓的数据下载软件包提供数据;这些数据管理系统包含公共和私人实体在公民数字生活过程中收集的所有数据,并为社会科学家形成一个宝藏宝箱;然而,这些数据可以是十分隐秘的;为了保护研究参与者的隐私,同时利用研究参与者的DDP进行科学研究,我们开发了识别软件,能够处理DDP的典型特征,例如经常改变文件结构、视觉和文字内容、不同的文件格式、不同的文件结构和用户名会计。我们调查软件的性能,并说明如何使软件适应特定的DDP结构。