Rapidly expanding Clinical AI applications worldwide have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We use the CheXpert and PadChest public datasets to build and test a medical imaging AI drift monitoring workflow that tracks data and model drift without contemporaneous ground truth. We simulate drift in multiple experiments to compare model performance with our novel multi-modal drift metric, which uses DICOM metadata, image appearance representation from a variational autoencoder (VAE), and model output probabilities as input. Through experimentation, we demonstrate a strong proxy for ground truth performance using unsupervised distributional shifts in relevant metadata, predicted probabilities, and VAE latent representation. Our key contributions include (1) proof-of-concept for medical imaging drift detection including use of VAE and domain specific statistical methods (2) a multi-modal methodology for measuring and unifying drift metrics (3) new insights into the challenges and solutions for observing deployed medical imaging AI (4) creation of open-source tools enabling others to easily run their own workflows or scenarios. This work has important implications for addressing the translation gap related to continuous medical imaging AI model monitoring in dynamic healthcare environments.
翻译:医疗成像应用构成经批准的临床AI应用的绝大多数。虽然保健系统渴望采用AI解决方案,但基本的问题仍然是:\textit{在AI模型生产完成后会发生什么?}我们使用CheXpert和PadCest公共数据集来建立和测试医学成像AI漂流监测工作流程,该流程跟踪数据和模型在没有同时存在的地面真相的情况下漂流。我们模拟了多种实验中的漂流,以将模型性能与我们新的多模式漂流指标进行比较,该标准使用DICOM元数据、变异自动计算器(VAE)的图像外观和模型产出概率作为投入。我们通过实验,展示了地面真实性表现的有力代理,使用了相关元数据、预测概率和VAE潜在代表的无超常分布变化。我们的主要贡献包括:(1) 医学成像漂流探测的验证,包括使用VAE和域特定统计方法(2) 测量和统一漂流度的多模式方法(3) 自己对模型的挑战和解决方案进行新的洞察,通过实验,对部署的医疗成像动的AI系统进行快速的动态监测。