Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly as CV systems highly depend on the data they are fed with and can learn and amplify biases within such data. Thus, the problems of understanding and discovering biases are of utmost importance. Yet, there is no comprehensive survey on bias in visual datasets. Hence, this work aims to: i) describe the biases that might manifest in visual datasets; ii) review the literature on methods for bias discovery and quantification in visual datasets; iii) discuss existing attempts to collect bias-aware visual datasets. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection.
翻译:然而,如果CV系统不妥善处理,可能会造成重大歧视,因为CV系统高度依赖它们所输入的数据,并且能够学习和扩大这些数据中的偏见。因此,理解和发现偏见的问题至关重要。然而,对于视觉数据集中的偏差没有进行全面调查。因此,这项工作的目的是:(一) 描述视觉数据集中可能显示的偏差;(二) 审查关于视觉数据集中偏差发现和量化方法的文献;(三) 讨论收集偏差视觉数据集的现有尝试。我们研究的一个重要结论是,视觉数据集中的偏差发现和量化问题仍然开放,在方法和可处理的偏差范围方面都有改进的余地。此外,没有没有不带偏差的数据集,因此科学家和从业人员必须了解其数据集中的偏差,并明确表明这些偏差。为此,我们提出了在视觉数据集收集过程中发现不同类型偏差的核对清单。