Current deep learning methods for object recognition are purely data-driven and require a large number of training samples to achieve good results. Due to their sole dependence on image data, these methods tend to fail when confronted with new environments where even small deviations occur. Human perception, however, has proven to be significantly more robust to such distribution shifts. It is assumed that their ability to deal with unknown scenarios is based on extensive incorporation of contextual knowledge. Context can be based either on object co-occurrences in a scene or on memory of experience. In accordance with the human visual cortex which uses context to form different object representations for a seen image, we propose an approach that enhances deep learning methods by using external contextual knowledge encoded in a knowledge graph. Therefore, we extract different contextual views from a generic knowledge graph, transform the views into vector space and infuse it into a DNN. We conduct a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset. The experimental results provide evidence that the contextual views influence the image representations in the DNN differently and therefore lead to different predictions for the same images. We also show that context helps to strengthen the robustness of object recognition models for out-of-distribution images, usually occurring in transfer learning tasks or real-world scenarios.
翻译:目前,物体识别的深层次学习方法纯粹是数据驱动的,需要大量的培训样本才能取得良好结果。由于这些方法仅仅依赖图像数据,因此在面对甚至出现小偏差的新环境时,这些方法往往会失败。然而,人类的感知证明比这种分布变化要强得多。人们认为,它们处理未知情景的能力是基于对背景知识的广泛整合。背景可以基于在现场或经验记忆中所了解的物体共同发生的情况。根据人类视觉皮层,利用环境来形成不同对象的图像表象,我们提出一种方法,通过使用知识图中编码的外部背景知识加强深层次学习方法。因此,我们从通用知识图中提取不同背景观点,将观点转化为矢量空间,并将其转化为DNN。我们进行了一系列实验,以调查不同背景观点对同一图像数据集所学对象表表的影响。实验结果证明,背景观点影响DNNE的不同面对图像的图象表示,从而导致对同一图像的不同预测。因此,我们从一个通用知识图表中提取不同的背景观点,我们通常会帮助加强真实图像的认知。