The current medical standard for setting an oral cancer (OC) diagnosis is histological examination of a tissue sample from the oral cavity. This process is time consuming and more invasive than an alternative approach of acquiring a brush sample followed by cytological analysis. Skilled cytotechnologists are able to detect changes due to malignancy, however, to introduce this approach into clinical routine is associated with challenges such as a lack of experts and labour-intensive work. To design a trustworthy OC detection system that would assist cytotechnologists, we are interested in AI-based methods that reliably can detect cancer given only per-patient labels (minimizing annotation bias), and also provide information on which cells are most relevant for the diagnosis (enabling supervision and understanding). We, therefore, perform a comparison of a conventional single instance learning (SIL) approach and a modern multiple instance learning (MIL) method suitable for OC detection and interpretation, utilizing three different neural network architectures. To facilitate systematic evaluation of the considered approaches, we introduce a synthetic PAP-QMNIST dataset, that serves as a model of OC data, while offering access to per-instance ground truth. Our study indicates that on PAP-QMNIST, the SIL performs better, on average, than the MIL approach. Performance at the bag level on real-world cytological data is similar for both methods, yet the single instance approach performs better on average. Visual examination by cytotechnologist indicates that the methods manage to identify cells which deviate from normality, including malignant cells as well as those suspicious for dysplasia. We share the code as open source at https://github.com/MIDA-group/OralCancerMILvsSIL
翻译:确定口腔癌(OC)诊断的现行医学标准是对口腔腔组织样本进行生理检查。这一过程耗时耗时且更具侵入性,比获取牙刷样本的替代方法更为耗时且更具侵入性,并随后进行细胞分析。熟练的细胞技术学家能够检测出恶性肿瘤导致的变化,然而,将这种方法引入临床常规常规常规诊断方法(SIL)和适合检测和解释的现代多实例学习方法(MIL)都与缺乏专家和劳动密集型工作等挑战相关联。为了设计一个可靠的OC检测系统,协助细胞技术/人工智能专家,我们感兴趣的是基于AI的、能够可靠地检测癌症的方法,仅以每个病人的标签(最小化注注偏差)为基础,并且提供与诊断最相关的细胞信息(增强监督和理解能力 ) 。因此,我们比较常规单例学习方法(SIL) 和现代多例学习方法(MIL),利用三种不同的神经网络结构。为了系统评估方法,我们引入一个合成的PAP-QNIST数据库数据集,作为OC-IL数据的模型,同时提供我们普通的实地评估方法。