Image annotation is one essential prior step to enable data-driven algorithms. In medical imaging, having large and reliably annotated data sets is crucial to recognize various diseases robustly. However, annotator performance varies immensely, thus impacts model training. Therefore, often multiple annotators should be employed, which is however expensive and resource-intensive. Hence, it is desirable that users should annotate unseen data and have an automated system to unobtrusively rate their performance during this process. We examine such a system based on whole slide images (WSIs) showing lung fluid cells. We evaluate two methods the generation of synthetic individual cell images: conditional Generative Adversarial Networks and Diffusion Models (DM). For qualitative and quantitative evaluation, we conduct a user study to highlight the suitability of generated cells. Users could not detect 52.12% of generated images by DM proofing the feasibility to replace the original cells with synthetic cells without being noticed.
翻译:图像说明是促成数据驱动算法的必要前奏。 在医疗成像中,拥有大量可靠的附加说明的数据集对于强有力地识别各种疾病至关重要。 但是,批注性能差异很大,从而影响模型培训。因此,通常应当使用多个批注器,无论费用多么昂贵和资源密集,因此,用户最好应注意秘密数据,并拥有自动系统,以在这一过程中无干扰地评定其性能。我们根据显示肺液细胞的整张幻灯片图像(SSI)来检查这样一个系统。我们评估合成个体细胞图像生成的两种方法:有条件的基因反转网络和扩散模型(DM)。为了定性和定量评估,我们进行了用户研究,以突出生成的细胞的适合性。用户无法检测到由DM所生成的52.12%的图像,而DM证明用合成细胞取代原始细胞而不引起注意的可行性。