Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect coronavirus using routine clinical data (blood tests, and vital signs). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this work, we examine two machine learning approaches, intended to predict a patient's COVID-19 status using routinely collected and readily available clinical data. We employ adversarial training to explore robust deep learning architectures that protect attributes related to demographic information about the patients. The two models we examine in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals, Bedfordshire Hospitals NHS Foundation Trust, University Hospitals Birmingham NHS Foundation Trust, and Portsmouth Hospitals University NHS Trust we train and test two neural networks that predict PCR test results using information from basic laboratory blood tests, and vital signs performed on a patients' arrival to hospital. We assess the level of privacy each one of the models can provide and show the efficacy and robustness of our proposed architectures against a comparable baseline. One of our main contributions is that we specifically target the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks.
翻译:早期检测COVID-19是一个持续的研究领域,可以帮助对潜在病人进行分类、监测和一般健康评估,减少应对冠状病毒流行的医院的业务压力。文献中使用了不同的机器学习技术,利用常规临床数据(血液测试和生命迹象)检测冠状病毒;使用这些模型时数据破坏和信息泄漏可带来声誉损害,并给医院造成法律问题。尽管如此,保护保健模式防止潜在敏感信息泄漏是一个研究不足的领域。在这项工作中,我们研究了两种机器学习方法,目的是利用定期收集和随时可获得的临床数据预测患者的COVI-19状况。我们利用对抗性培训培训,探索保护与病人人口信息有关属性的强健深学习结构。我们在这项工作中审查的两个模型旨在保护敏感信息,防止对抗敌对性攻击和信息隐私泄漏。在使用牛津大学医院模型的数据集进行一系列实验中,Bedfordshire医院NHS基金会信托基金,大学医院Birmingham NHS基金会信托基金,以及PHSINS医院的精确性测试网络,通过NHIS Trust大学的一项基本实验室测试和测试,显示我们每个实验室的神经信号。