Medical image segmentation annotations suffer from inter- and intra-observer variations even among experts due to intrinsic differences in human annotators and ambiguous boundaries. Leveraging a collection of annotators' opinions for an image is an interesting way of estimating a gold standard. Although training deep models in a supervised setting with a single annotation per image has been extensively studied, generalizing their training to work with datasets containing multiple annotations per image remains a fairly unexplored problem. In this paper, we propose an approach to handle annotators' disagreements when training a deep model. To this end, we propose an ensemble of Bayesian fully convolutional networks (FCNs) for the segmentation task by considering two major factors in the aggregation of multiple ground truth annotations: (1) handling contradictory annotations in the training data originating from inter-annotator disagreements and (2) improving confidence calibration through the fusion of base models' predictions. We demonstrate the superior performance of our approach on the ISIC Archive and explore the generalization performance of our proposed method by cross-dataset evaluation on the PH2 and DermoFit datasets.
翻译:医疗图象分解说明由于人类笔记员和模糊边界的内在差异,甚至专家之间也存在观察器内部和观察器内部的差异,医疗图象分解说明也存在差异,因此专家之间也有差异; 利用笔记员的意见集来估计图象,是估算金质标准的有趣方式; 虽然对在监督环境下以每幅图象单注方式对深层模型进行深层模型培训,但已经进行了广泛的研究, 将关于每幅图象包含多注解的数据集的培训推广到工作, 其范围很广, 仍是一个相当未探讨的问题; 本文提出在培训深层模型时处理告别者分歧的方法。 为此,我们建议为分解任务建立一个贝耶西亚全面革命网络(FCNs)的合集, 考虑多重地面图解汇总中的两个主要因素:(1) 处理由笔记者之间分歧产生的培训数据中相互矛盾的说明,(2) 通过集基础模型预测,改进信任校准。 我们展示了我们在ISIC档案中的方法的优异性表现,并通过对PH2和DemoFit Fit数据集进行交叉评估,探索我们拟议方法的通用性表现。