This work introduces SkinGenBench, a systematic biomedical imaging benchmark that investigates how preprocessing complexity interacts with generative model choice for synthetic dermoscopic image augmentation and downstream melanoma diagnosis. Using a curated dataset of 14,116 dermoscopic images from HAM10000 and MILK10K across five lesion classes, we evaluate the two representative generative paradigms: StyleGAN2-ADA and Denoising Diffusion Probabilistic Models (DDPMs) under basic geometric augmentation and advanced artifact removal pipelines. Synthetic melanoma images are assessed using established perceptual and distributional metrics (FID, KID, IS), feature space analysis, and their impact on diagnostic performance across five downstream classifiers. Experimental results demonstrate that generative architecture choice has a stronger influence on both image fidelity and diagnostic utility than preprocessing complexity. StyleGAN2-ADA consistently produced synthetic images more closely aligned with real data distributions, achieving the lowest FID (~65.5) and KID (~0.05), while diffusion models generated higher variance samples at the cost of reduces perceptual fidelity and class anchoring. Advanced artifact removal yielded only marginal improvements in generative metrics and provided limited downstream diagnostic gains, suggesting possible suppression of clinically relevant texture cues. In contrast, synthetic data augmentation substantially improved melanoma detection with 8-15% absolute gains in melanoma F1-score, and ViT-B/16 achieving F1~0.88 and ROC-AUC~0.98, representing an improvement of approximately 14% over non-augmented baselines. Our code can be found at https://github.com/adarsh-crafts/SkinGenBench
翻译:本研究提出了SkinGenBench,一个系统的生物医学影像基准测试,旨在探究预处理复杂度与生成模型选择如何相互作用,以用于合成皮肤镜图像增强及下游黑色素瘤诊断。基于从HAM10000和MILK10K数据集中精选的14,116张涵盖五类皮损的皮肤镜图像,我们在基础几何增强与高级伪影去除流程下,评估了两种代表性生成范式:StyleGAN2-ADA和去噪扩散概率模型(DDPMs)。合成黑色素瘤图像通过成熟的感知与分布度量指标(FID、KID、IS)、特征空间分析及其对五种下游分类器诊断性能的影响进行评估。实验结果表明,生成架构的选择对图像保真度和诊断效用均具有比预处理复杂度更强的影响。StyleGAN2-ADA持续生成与真实数据分布更接近的合成图像,取得了最低的FID(约65.5)和KID(约0.05),而扩散模型生成的样本方差较高,但代价是感知保真度降低和类别锚定性减弱。高级伪影去除仅在生成指标上带来边际改善,且对下游诊断性能的提升有限,这可能抑制了具有临床相关性的纹理线索。相比之下,合成数据增强显著提升了黑色素瘤检测性能,黑色素瘤F1分数获得8-15%的绝对提升,其中ViT-B/16模型达到F1约0.88和ROC-AUC约0.98,较未增强基线提升约14%。我们的代码可在https://github.com/adarsh-crafts/SkinGenBench 获取。