Purpose - To characterise and assess the quality of published research evaluating artificial intelligence (AI) methods for ovarian cancer diagnosis or prognosis using histopathology data. Methods - A search of 5 sources was conducted up to 01/12/2022. The inclusion criteria required that research evaluated AI on histopathology images for diagnostic or prognostic inferences in ovarian cancer, including tubo-ovarian and peritoneal tumours. Reviews and non-English language articles were excluded. The risk of bias was assessed for every included model using PROBAST. Results - A total of 1434 research articles were identified, of which 36 were eligible for inclusion. These studies reported 62 models of interest, including 35 classifiers, 14 survival prediction models, 7 segmentation models, and 6 regression models. Models were developed using 1-1375 slides from 1-664 ovarian cancer patients. A wide array of outcomes were predicted, including overall survival (9/62), histological subtypes (7/62), stain quantity (6/62) and malignancy (5/62). Older studies used traditional machine learning (ML) models with hand-crafted features, while newer studies typically employed deep learning (DL) to automatically learn features and predict the outcome(s) of interest. All models were found to be at high or unclear risk of bias overall. Research was frequently limited by insufficient reporting, small sample sizes, and insufficient validation. Conclusion - Limited research has been conducted and none of the associated models have been demonstrated to be ready for real-world implementation. Recommendations are provided addressing underlying biases and flaws in study design, which should help inform higher-quality reproducible future research. Key aspects include more transparent and comprehensive reporting, and improved performance evaluation using cross-validation and external validations.
翻译:目的-描述和评估已发布的研究,评估使用组织病理学数据进行卵巢癌诊断或预后的人工智能(AI)方法的质量和特征。 方法-在2022年1月12日之前对5个来源进行了搜索。纳入标准要求研究对卵巢癌的病理组织图像进行AI评估,用于诊断或预后推断,包括输卵管卵巢癌和腹膜肿瘤。排除了评论和非英语文章。使用PROBAST为每个包含模型评估偏倚风险。 结果-共发现了1434篇研究文章,其中36篇符合纳入标准。这些研究报告了62个感兴趣的模型,包括35个分类器,14个生存预测模型,7个分割模型和6个回归模型。这些模型使用了来自1-664名卵巢癌患者的1-1375张幻灯片进行开发。预测的结果包括总体生存(9/62),组织学亚型(7/62),染色质量(6/62)和恶性程度(5/62)。早期研究使用传统的机器学习(ML)模型和手工制作的特征,而较新的研究通常利用深度学习(DL)自动学习特征并预测感兴趣的结果。所有模型总体上都被认为存在高风险或不确定风险的偏倚。研究经常受限于不充分的报告,样本量小和不充足的验证。 结论-进行了有限的研究,其中任何相关的模型都尚未证明可以实现现实世界的实施。提出了推荐,解决了研究设计中的潜在偏见和缺陷,这应有助于推动更高质量、可重复的未来研究。主要方面包括更透明、更全面的报告和使用交叉验证和外部验证来改进性能评估。