Algorithms based on deep network models are being used for many pattern recognition and decision-making tasks in robotics and AI. Training these models requires a large labeled dataset and considerable computational resources, which are not readily available in many domains. Also, it is difficult to explore the internal representations and reasoning mechanisms of these models. As a step towards addressing the underlying knowledge representation, reasoning, and learning challenges, the architecture described in this paper draws inspiration from research in cognitive systems. As a motivating example, we consider an assistive robot trying to reduce clutter in any given scene by reasoning about the occlusion of objects and stability of object configurations in an image of the scene. In this context, our architecture incrementally learns and revises a grounding of the spatial relations between objects and uses this grounding to extract spatial information from input images. Non-monotonic logical reasoning with this information and incomplete commonsense domain knowledge is used to make decisions about stability and occlusion. For images that cannot be processed by such reasoning, regions relevant to the tasks at hand are automatically identified and used to train deep network models to make the desired decisions. Image regions used to train the deep networks are also used to incrementally acquire previously unknown state constraints that are merged with the existing knowledge for subsequent reasoning. Experimental evaluation performed using simulated and real-world images indicates that in comparison with baselines based just on deep networks, our architecture improves reliability of decision making and reduces the effort involved in training data-driven deep network models.
翻译:基于深层次网络模型的算法正在用于机器人和AI中的许多模式识别和决策任务。培训这些模型需要大量标签数据集和大量计算资源,而这些在许多领域都不容易获得。此外,很难探索这些模型的内部表述和推理机制。作为解决基本知识代表性、推理和学习挑战的一个步骤,本文件描述的架构从认知系统的研究中汲取灵感。作为一个激励性的例子,我们认为,一个协助机器人试图通过推理物体的封闭性和物体配置在现场图像中的稳定性来减少任何特定场景的混乱。在此背景下,我们的架构逐步学习和修改物体之间的空间关系的基底,并利用这种基底结构从输入图像中提取空间信息。使用这种信息的非流动逻辑推理和不完整的普通域知识来做出关于稳定性和封闭性的决定。对于无法通过这种推理处理的图像,与手头任务相关的区域被自动确定并用于在深度网络中培训深度网络驱动力模型,以便利用之前的模型进行精确推算,从而进行模拟推算。使用这种基底图像区域来进行模拟评估,用来对现有的模型进行培训,从而改进我们现有的结构。