Objective: The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples inevitably generates identifying data that can consist of sensitive but also acquisition-related information stored in vendor-specific file formats. Distribution and off-clinical use of these Whole Slide Images (WSI) is usually done in these formats, as an industry-wide standardization such as DICOM is yet only tentatively adopted and slide scanner vendors currently do not provide anonymization functionality. Methods: We developed a guideline for the proper handling of histopathological image data particularly for research and education with regard to the GDPR. In this context, we evaluated existing anonymization methods and examined proprietary format specifications to identify all sensitive information for the most common WSI formats. This work results in a software library that enables GDPR-compliant anonymization of WSIs while preserving the native formats. Results: Based on the analysis of proprietary formats, all occurrences of sensitive information were identified for file formats frequently used in clinical routine, and finally, an open-source programming library with an executable CLI-tool and wrappers for different programming languages was developed. Conclusions: Our analysis showed that there is no straightforward software solution to anonymize WSIs in a GDPR-compliant way while maintaining the data format. We closed this gap with our extensible open-source library that works instantaneously and offline.
翻译:目标:健康相关数据的交流须遵守区域法律和条例,如欧盟《一般数据保护条例》或美国《健康保险便携和问责制法》等,因此研究人员和教育工作者在使用这些数据时遇到非三重挑战。在病理学方面,诊断组织样本的数字化不可避免地产生识别数据,这些数据可以包括敏感但也是以供应商特定文件格式储存的购置相关信息。这些全幻灯片图像的分发和在业外使用通常以这些格式进行,因为像DICOM这样的全行业标准化目前只是暂时采用,而幻灯片扫描供应商目前并不提供匿名功能。方法:我们为正确处理与GDP相关的直系病象学图像数据,特别是研究和教育,制定了导则。在这方面,我们评估了现有的匿名方法,并检查了专利格式,以识别最通用的WSI格式的所有敏感信息。这项工作的结果是在软件图书馆中实现对WSI的匿名化,在保存日常常规数据时,我们用的是常规格式,我们用直径直径直的SLIS格式,我们用直径直径直径直的服务器,我们用直径直径直路路路路分析。