The human ability to recognize when an object is known or novel currently outperforms all open set recognition algorithms. Human perception as measured by the methods and procedures of visual psychophysics from psychology can provide an additional data stream for managing novelty in visual recognition tasks in computer vision. For instance, measured reaction time from human subjects can offer insight as to whether a known class sample may be confused with a novel one. In this work, we designed and performed a large-scale behavioral experiment that collected over 200,000 human reaction time measurements associated with object recognition. The data collected indicated reaction time varies meaningfully across objects at the sample level. We therefore designed a new psychophysical loss function that enforces consistency with human behavior in deep networks which exhibit variable reaction time for different images. As in biological vision, this approach allows us to achieve good open set recognition performance in regimes with limited labeled training data. Through experiments using data from ImageNet, significant improvement is observed when training Multi-Scale DenseNets with this new formulation: models trained with our loss function significantly improved top-1 validation accuracy by 7%, top-1 test accuracy on known samples by 18%, and top-1 test accuracy on unknown samples by 33%. We compared our method to 10 open set recognition methods from the literature, which were all outperformed on multiple metrics.
翻译:人类在已知物体或新事物被发现时的识别能力目前比所有开放的识别算法都好。 以心理学视觉心理物理学的方法和程序衡量的人类感知,可以提供额外的数据流,用于管理计算机视觉识别任务中的新颖的视觉识别任务。 例如,人类实验对象的测量反应时间可以使人们深入了解已知的类别样本是否与新颖的样本混淆。 在这项工作中,我们设计并进行了大规模的行为实验,收集了20多万个与物体识别有关的人类反应时间测量。 所收集的数据表明,在样本一级,不同对象的反应时间差别很大。 因此,我们设计了一个新的精神物理损失功能,在深网络中,与人类行为的一致性得到了加强,这些深网络显示不同图像的可变反应时间。 与生物观察一样,这种方法使我们能够在使用有限标签培训数据的系统中实现良好的公开确认。 通过使用图像网络的数据,我们观察到了在用这种新配方来培训多波段DenseNet时, 观察到了显著的改进: 以我们的损失函数培训的模型大大提高了上层-1的精确度,7 %, 以18 % 的顶层-1 测试样本的精确度, 和顶层1 测试样本的精确度为底部的精确度, 33%的样本的精确度是所有不为10 的抽样的确认。