Robots and artificial agents that interact with humans should be able to do so without bias and inequity, but facial perception systems have notoriously been found to work more poorly for certain groups of people than others. In our work, we aim to build a system that can perceive humans in a more transparent and inclusive manner. Specifically, we focus on dynamic expressions on the human face, which are difficult to collect for a broad set of people due to privacy concerns and the fact that faces are inherently identifiable. Furthermore, datasets collected from the Internet are not necessarily representative of the general population. We address this problem by offering a Sim2Real approach in which we use a suite of 3D simulated human models that enables us to create an auditable synthetic dataset covering 1) underrepresented facial expressions, outside of the six basic emotions, such as confusion; 2) ethnic or gender minority groups; and 3) a wide range of viewing angles that a robot may encounter a human in the real world. By augmenting a small dynamic emotional expression dataset containing 123 samples with a synthetic dataset containing 4536 samples, we achieved an improvement in accuracy of 15% on our own dataset and 11% on an external benchmark dataset, compared to the performance of the same model architecture without synthetic training data. We also show that this additional step improves accuracy specifically for racial minorities when the architecture's feature extraction weights are trained from scratch.
翻译:与人类互动的机器人和人工代理人应该能够在没有偏见和不公平的情况下这样做,但面部感知系统却被臭名昭著地发现对某些群体比其他人更差。 在我们的工作中,我们的目标是建立一个能够以更透明和更具包容性的方式看待人类的系统。 具体地说,我们侧重于在人脸上的动态表达方式,由于隐私问题和面貌本身固有的可识别性,很难收集到一大批人。 此外,从互联网收集的数据集不一定代表一般人口。 我们通过提供Sim2Real方法来解决这一问题,我们使用一套3D模拟人类模型,使我们能够建立一个可审计的合成数据集,涵盖1个代表性的面部表达方式,在6种基本情感之外,例如混乱;(2) 种族或性别少数群体;(3) 广泛观察机器人在现实世界中可能遇到人的视角。 通过增加一个包含123个样本、包含4536个样本的合成情感表达数据集的小型数据集。 我们通过提供一套Sim2Real方法, 来提高我们自己的数据集中15%的准确性模型,而没有经过精度的合成精度结构, 也具体地改进了这一外部数据结构。