Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model's learning towards features from image regions that humans find salient for the task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose this model saliency with human saliency, and penalize large differences. Results on the task of synthetic face detection, selected to illustrate the effectiveness of the approach, show that CYBORG leads to significant improvement in accuracy on unseen samples consisting of face images generated from six Generative Adversarial Networks across multiple classification network architectures. We also show that scaling to even seven times the training data with standard loss cannot beat CYBORG accuracy. As a side effect, we observe that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions in practice. All data, code and pre-trained models used in this work are offered with this paper.
翻译:深层次学习模型如果以人类感知能力为指导来指导其培训,能否实现更广义的概括化?以及我们如何以实际方式实施这一模式?本文件提议了一项培训战略,以传播脑力监督提高普遍性(CYBORG) 。这一新方法将人文说明突出的地图纳入CYBORG损失函数中,该功能指导模型学习人类发现任务突出的图像区域的特征。级启动映射机制(CAM)被用来探测模型在每批培训中的当前显著特征,将这一模型与人的突出特征相提并论,并惩罚巨大的差异。为说明该方法的有效性而选择的合成面部检测任务的结果显示,CYBORG使由6个Generation Aversarial网络生成的图像的无形样本的准确性显著提高。我们还表明,将标准损失培训数据的扩大甚至七倍,无法击败CYBORG的准确性。作为副作用,我们注意到,在合成面部发现任务中增加明确的区域说明,合成面部检测任务的结果,表明CYBORG将提高人类的清晰度,这一模型用于所有视觉损失的模型。