Deep learning has driven remarkable accuracy increases in many computer vision problems. One ongoing challenge is how to achieve the greatest accuracy in cases where training data is limited. A second ongoing challenge is that trained models oftentimes do not generalize well even to new data that is subjectively similar to the training set. We address these challenges in a novel way, with the first-ever (to our knowledge) exploration of encoding human judgement about salient regions of images into the training data. We compare the accuracy and generalization of a state-of-the-art deep learning algorithm for a difficult problem in biometric presentation attack detection when trained on (a) original images with typical data augmentations, and (b) the same original images transformed to encode human judgement about salient image regions. The latter approach results in models that achieve higher accuracy and better generalization, decreasing the error of the LivDet-Iris 2020 winner from 29.78% to 16.37%, and achieving impressive generalization in a leave-one-attack-type-out evaluation scenario. This work opens a new area of study for how to embed human intelligence into training strategies for deep learning to achieve high accuracy and generalization in cases of limited training data.
翻译:深层学习促使许多计算机视觉问题的精确度显著提高。一个持续的挑战是如何在培训数据有限的情况下实现最准确性。第二个持续的挑战是,经过培训的模型往往没有很好地推广到主观上与培训组相类似的新数据。我们以新颖的方式应对这些挑战,首次(据我们所知)探索将人类对显著图像区域的判断编码到培训数据中。我们比较了最先进的深层次学习算法的准确性和概括性,以发现生物鉴别式攻击检测中的一个困难问题:在培训时,(a) 具有典型数据增强功能的原始图像,和(b) 相同的原始图像转换为对突出图像区域的人类判断编码。后一种方法的结果是,模型的准确性和准确性更高,将LivDet-Iris2020赢家的错误从29.78%降低到16.37 %,并在放任一击式评价假设中实现令人印象深刻的概括化。这项工作开启了一个新的研究领域,即如何将人类情报纳入培训战略,以便深入学习如何在有限数据案例中实现高度精确性和概括。