Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.
翻译:以人为中心的图像数据集对于开发计算机视觉技术至关重要,然而,最近的调查揭示了与隐私和偏见有关的重大伦理问题,导致完全撤回或修改若干突出的数据集。最近的工作试图扭转这一趋势,例如,提出以道德方式评价数据集的分析框架、数据集文件和曲线整理做法的标准化、隐私保护方法以及表面和减少代表性偏见的工具。然而,很少注意道德数据收集工作的实际运作。为填补这一空白,我们提出了一套重要的伦理考虑和实际建议,以收集更合乎道德道德的以人为中心的图像数据。我们的研究直接解决隐私和偏见问题,方法是为研究社区收集道德数据的最佳做法作出贡献,涵盖目的、隐私和同意以及多样性。我们通过吸取当前做法、数据集提取和审计以及分析道德框架的经验教训来推动每项考虑。我们的研究旨在增加最近的奖学金,这是朝着更负责任的数据整理做法迈出的重要一步。