In practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.
翻译:在实际中,特别是在培训深层神经网络时,视觉识别规则往往根据各种信息来源学习。另一方面,最近在不同人口部分部署面部识别系统,其业绩参差不齐,这凸显了由天真地汇总数据集引起的代表性问题。在本文中,我们展示了偏向模型如何纠正这些问题。根据对工作偏向机制的(近似)了解,我们的方法包括重新权衡观察结果,从而形成目标分布的近乎贬低的估测器。一个关键条件是,偏向分布的支持必须部分重叠,并覆盖目标分布的支持。为了在实践中满足这一要求,我们提议使用低维度图像代表,在图像数据库中共享。最后,我们提供数字实验,强调我们方法的相关性。