Recent developments in machine learning have shown that successful models do not rely only on huge amounts of data but the right kind of data. We show in this paper how this data-centric approach can be facilitated in a decentralized manner to enable efficient data collection for algorithms. Face detectors are a class of models that suffer heavily from bias issues as they have to work on a large variety of different data. We also propose a face detection and anonymization approach using a hybrid MultiTask Cascaded CNN with FaceNet Embeddings to benchmark multiple datasets to describe and evaluate the bias in the models towards different ethnicities, gender, and age groups along with ways to enrich fairness in a decentralized system of data labeling, correction, and verification by users to create a robust pipeline for model retraining.
翻译:最近在机器学习方面的发展表明,成功的模型不仅依赖于大量的数据,而且依赖于正确的数据类型。我们在本文件中展示了如何以分散方式促进这种以数据为中心的方法,以便能够有效地为算法收集数据。面对面的探测器是一大类模型,由于它们必须处理大量不同的数据,因而深受偏见问题的影响。我们还提议采用面对面的探测和匿名方法,使用混合的多任务卡带和面板网嵌入的多功能网络CNN来基准多个数据集,以描述和评价模型中偏向不同族裔、性别和年龄组的偏向,同时在分散的数据标签、校正和用户核查系统中丰富公平性,以便为模式再培训建立一个强有力的管道。