A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in developing a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model. In this paper, we aim to explore ways to address this problem through a sim2real transfer approach in medical image fields. To build a platform to evaluate the performance of sim2real transfer methods in the field of medical imaging, we construct a benchmark dataset that consists of $101$ chest X-images with difficult-to-identify pneumonia lesions judged by an experienced radiologist and a simulator based on fractal Perlin noise and the X-ray principle for generating pseudo pneumonia lesions. We then develop a novel domain randomization method, called Goldilocks-curriculum domain randomization (GDR) and evaluate our method in this platform.
翻译:基于机器学习的计算机辅助检测系统(CAD)预计将协助放射学家进行诊断,宜针对医院每天积累的各种疾病建立CAD系统,在为疾病开发CAD系统时遇到的一个障碍是,医疗图像的数量通常太小,无法改善机器学习模型的性能。在本文件中,我们的目标是探索如何通过医学图像领域的模拟转移方法解决这一问题。为了建立一个平台,评估医学成像领域模拟转移方法的性能,我们建立了一个基准数据集,由101万美元的胸X图像组成,其中含有难以辨识的肺部损伤,该模型由经验丰富的放射学家和模拟器根据折形 Perlin噪音和生成假肺损伤的X射线原理来判断。我们随后开发了一种新型域随机化方法,称为Goldiocks-curiculum域随机化(GDRDR),并评价我们在这个平台上的方法。