Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.
翻译:然而,标准的ML模型往往缺乏临床决策所需的严格评估; 自然图像的机器学习技术设备不足,无法处理巨大和吵闹的病理图像,需要昂贵的标签,难以解释,容易产生虚假的关联; 我们提出了一套针对上述关切的病理病理学ML评估实用指南; 该文件包括建立评价框架、有效处理标签变异的措施,以及解决与域转移、稳健性和混杂变量有关的问题的建议测试套件。 我们希望拟议框架将缩小ML研究人员和域专家之间的差距,从而在病理学中更广泛地采用ML技术,改善患者结果。