Computer-Aided Diagnosis (CAD) systems for chest radiographs using artificial intelligence (AI) have recently shown a great potential as a second opinion for radiologists. The performances of such systems, however, were mostly evaluated on a fixed dataset in a retrospective manner and, thus, far from the real performances in clinical practice. In this work, we demonstrate a mechanism for validating an AI-based system for detecting abnormalities on X-ray scans, VinDr-CXR, at the Phu Tho General Hospital{a provincial hospital in the North of Vietnam. The AI system was directly integrated into the Picture Archiving and Communication System (PACS) of the hospital after being trained on a fixed annotated dataset from other sources. The performance of the system was prospectively measured by matching and comparing the AI results with the radiology reports of 6,285 chest X-ray examinations extracted from the Hospital Information System (HIS) over the last two months of 2020. The normal/abnormal status of a radiology report was determined by a set of rules and served as the ground truth. Our system achieves an F1 score{the harmonic average of the recall and the precision{of 0.653 (95% CI 0.635, 0.671) for detecting any abnormalities on chest X-rays. Despite a significant drop from the in-lab performance, this result establishes a high level of confidence in applying such a system in real-life situations.
翻译:最近,利用人工智能(AI)对胸腔射线系统计算机辅助诊断(CAD)的计算机辅助诊断(CAD)系统显示出作为放射学家的第二个意见的巨大潜力,然而,这些系统的性能大多以回顾性的方式在固定的数据集中进行了评估,因此与临床实践的实际表现相去甚远。在这项工作中,我们展示了一种机制,用以验证在越南北部Phu Tho总医院{a省医院的X射线扫描中检测X射线异常的基于AI的VinDr-CXR系统。AI系统直接融入了医院的图片存档和通信系统(PACS),在接受其他来源固定的附加说明数据集培训后,该系统的性能大部分通过将AI的结果与2020年最后两个月从医院信息系统(HIS)提取的6,285胸X光检查结果进行对比和比较来衡量。放射科报告的正常/不正常状况由一套规则确定,并作为地面真相。我们的系统在0.65和甚高水平上对0.65进行精确度的检查。