Trusted Research environments (TRE)s are safe and secure environments in which researchers can access sensitive data. With the growth and diversity of medical data such as Electronic Health Records (EHR), Medical Imaging and Genomic data, there is an increase in the use of Artificial Intelligence (AI) in general and the subfield of Machine Learning (ML) in particular in the healthcare domain. This generates the desire to disclose new types of outputs from TREs, such as trained machine learning models. Although specific guidelines and policies exists for statistical disclosure controls in TREs, they do not satisfactorily cover these new types of output request. In this paper, we define some of the challenges around the application and disclosure of machine learning for healthcare within TREs. We describe various vulnerabilities the introduction of AI brings to TREs. We also provide an introduction to the different types and levels of risks associated with the disclosure of trained ML models. We finally describe the new research opportunities in developing and adapting policies and tools for safely disclosing machine learning outputs from TREs.
翻译:受信任的研究环境是研究人员能够获得敏感数据的安全和有保障的环境。随着电子健康记录、医疗成像和基因组数据等医疗数据的增长和多样性,一般使用人工智能(AI)和机器学习子领域(ML)的情况有所增加,特别是在保健领域。这促使人们渴望披露技术资源新类型的产出,例如经过培训的机器学习模型。尽管在技术资源中存在统计公布控制的具体准则和政策,但它们不能令人满意地满足这些新的产出要求。我们在本文件中界定了在技术资源中应用和披露机器学习促进保健的一些挑战。我们描述了采用人工智能给技术资源带来的各种弱点。我们还介绍了与披露经过培训的ML模型有关的不同类型和不同程度的风险。我们最后描述了在制订和修改安全披露技术资源中机器学习产出的政策和工具方面的新的研究机会。