While artificial intelligence (AI) holds promise for supporting healthcare providers and improving the accuracy of medical diagnoses, a lack of transparency in the composition of datasets exposes AI models to the possibility of unintentional and avoidable mistakes. In particular, public and private image datasets of dermatological conditions rarely include information on skin color. As a start towards increasing transparency, AI researchers have appropriated the use of the Fitzpatrick skin type (FST) from a measure of patient photosensitivity to a measure for estimating skin tone in algorithmic audits of computer vision applications including facial recognition and dermatology diagnosis. In order to understand the variability of estimated FST annotations on images, we compare several FST annotation methods on a diverse set of 460 images of skin conditions from both textbooks and online dermatology atlases. We find the inter-rater reliability between three board-certified dermatologists is comparable to the inter-rater reliability between the board-certified dermatologists and two crowdsourcing methods. In contrast, we find that the Individual Typology Angle converted to FST (ITA-FST) method produces annotations that are significantly less correlated with the experts' annotations than the experts' annotations are correlated with each other. These results demonstrate that algorithms based on ITA-FST are not reliable for annotating large-scale image datasets, but human-centered, crowd-based protocols can reliably add skin type transparency to dermatology datasets. Furthermore, we introduce the concept of dynamic consensus protocols with tunable parameters including expert review that increase the visibility of crowdwork and provide guidance for future crowdsourced annotations of large image datasets.
翻译:虽然人工智能(AI)在支持保健提供者和提高医疗诊断准确性方面很有希望,但人工智能(AI)在支持保健提供者和提高医疗诊断准确性方面很有希望,但数据集构成缺乏透明度使AI模型有可能出现无意和可避免的错误。特别是,关于皮肤状况的公共和私营图像数据集很少包含皮肤颜色信息。作为提高透明度的开端,AI研究人员从病人对光敏度的测量中,将Fitzpatrick皮肤类型(FST)用于估算计算机视觉应用的算法审计中的皮肤音量的措施,包括面部识别和皮肤诊断。为了了解图像中估计FST说明的变异性,我们比较了数个FST的颜色参数说明方法,对教科书和在线皮肤图解图解中各种皮肤状况的460个图谱进行了不同的比较。我们发现,三个委员会认证的皮肤学家之间的可靠性可与董事会认证的皮肤学家和两种众包方法之间的可靠性相当。相比之下,个人图解转换成FST(IT-FST)的图形解释方法产生说明说明,其清晰性与我们每个专家的图表类型都比较性不甚甚甚甚高。