This paper takes on the problem of automatically identifying clinically-relevant patterns in medical datasets without compromising patient privacy. To achieve this goal, we treat datasets as a black box for both internal and external users of data that lets us handle clinical data queries directly and far more efficiently. The novelty of the approach lies in avoiding the data de-identification process often used as a means of preserving patient privacy. The implemented toolkit combines software engineering technologies such as Java EE and RESTful web services, to allow exchanging medical data in an unidentifiable XML format as well as restricting users to the need-to-know principle. Our technique also inhibits retrospective processing of data, such as attacks by an adversary on a medical dataset using advanced computational methods to reveal Protected Health Information (PHI). The approach is validated on an endoscopic reporting application based on openEHR and MST standards. From the usability perspective, the approach can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care services to improve quality of care.
翻译:本文探讨了在不损害患者隐私的情况下自动识别医疗数据集中与临床相关的模式的问题。为了实现这一目标,我们把数据集作为数据内部和外部用户的黑盒,以便直接和更有效地处理临床数据查询。这一方法的新颖之处在于避免经常作为保护患者隐私手段使用的数据去身份识别程序。实施的工具包结合了诸如Java EEE和REST型网络服务等软件工程技术,允许以无法识别的XML格式交换医疗数据,并将用户限制在需要了解的原则之下。我们的技术还禁止追溯性处理数据,例如对手利用先进的计算方法攻击医疗数据集以披露保护健康信息。这种方法在基于开放EHR和MST标准的底部报告应用程序上得到验证。从可用性角度来说,这种方法可以用来查询临床研究人员、政府组织或非政府组织在监测保健服务质量方面的数据集。