Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in major discrimination if not dealt with proper care. CV systems highly depend on the data they are fed with and can learn and amplify biases within such data. Thus, both the problems of understanding and discovering biases are of utmost importance. Yet, to date there is no comprehensive survey on bias in visual datasets. To this end, this work aims to: i) describe the biases that can affect visual datasets; ii) review the literature on methods for bias discovery and quantification in visual datasets; iii) discuss existing attempts to collect bias-aware visual datasets. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open and there is room for improvement in terms of both methods and the range of biases that can be addressed; moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist that can be used to spot different types of bias during visual dataset collection.
翻译:计算机视觉(CV)取得了显著的成果,在几项任务中表现优于人。然而,如果得不到妥善的注意,它可能导致重大歧视。CV系统高度依赖它们所喂养的数据,并且能够学习和扩大这些数据中的偏见。因此,理解和发现偏见的问题都极为重要。然而,迄今为止,还没有对视觉数据集中的偏见进行全面调查。为此,这项工作的目的是:(一) 描述可能影响视觉数据集的偏见;(二) 审查关于视觉数据集中偏见发现和量化方法的文献;(三) 讨论现有的收集偏差认知视觉数据集的尝试。我们研究的一个重要结论是,视觉数据集中发现和量化偏差的问题仍然开放,在方法和可处理的偏差范围方面都有改进的余地。此外,没有建立无偏差的数据集,因此科学家和从业人员必须了解其数据集中的偏差,并明确说明这些偏差。为此,我们提出一个清单,用以在视觉数据集收集过程中发现不同类型的偏差。