Human ratings are abstract representations of segmentation quality. To approximate human quality ratings on scarce expert data, we train surrogate quality estimation models. We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation, following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. Even though the networks operate on 2D images and with scarce training data, we can approximate segmentation quality within a margin of error comparable to human intra-rater reliability. Segmentation quality prediction has broad applications. While an understanding of segmentation quality is imperative for successful clinical translation of automatic segmentation quality algorithms, it can play an essential role in training new segmentation models. Due to the split-second inference times, it can be directly applied within a loss function or as a fully-automatic dataset curation mechanism in a federated learning setting.
翻译:人类评级是分化质量的抽象表示。为了在稀缺的专家数据中大致反映人的质量评级,我们培训替代质量估算模型。我们根据BRATS批注协议,评估复杂的多级分解问题,特别是微粒分解问题。培训数据包含15名神经神经放射专家的质量评级,其规模从1至6星不等,用于各种计算机生成的和人工的3D说明。即使网络以2D图像和稀缺的培训数据运作,我们也可以在与人类河内可靠性相近的误差范围内估计分解质量。分解质量预测具有广泛的应用性。虽然对分解质量的理解对于自动分解质量算法的成功临床翻译至关重要,但在培训新的分解模型方面可以发挥必不可少的作用。由于分秒的推断时间,它可以在损失函数中直接应用,或者作为联邦学习环境中的完全自动数据集整理机制。