Deep learning models have shown great potential for image-based diagnosis assisting clinical decision making. At the same time, an increasing number of reports raise concerns about the potential risk that machine learning could amplify existing health disparities due to human biases that are embedded in the training data. It is of great importance to carefully investigate the extent to which biases may be reproduced or even amplified if we wish to build fair artificial intelligence systems. Seyyed-Kalantari et al. advance this conversation by analysing the performance of a disease classifier across population subgroups. They raise performance disparities related to underdiagnosis as a point of concern; we identify areas from this analysis which we believe deserve additional attention. Specifically, we wish to highlight some theoretical and practical difficulties associated with assessing model fairness through testing on data drawn from the same biased distribution as the training data, especially when the sources and amount of biases are unknown.
翻译:同时,越来越多的报告对机器学习可能由于培训数据中包含的人类偏见而扩大现有健康差距的潜在风险表示关切,必须认真调查如果我们希望建立公平的人工智能系统,那么偏见可以复制或甚至扩大的程度。Seyyed-Kalantari等人通过分析各人口分组的疾病分类员的表现来推进这一对话,这增加了与诊断不足有关的绩效差异,认为这是一个令人关切的问题;我们从这一分析中找出了一些我们认为值得更多注意的领域。具体地说,我们要强调,在通过测试从与培训数据相同的偏差分布中提取的数据,评估模型公平性方面存在一些理论和实际困难,特别是在偏见的来源和数量不明的情况下。