深象:从无人监督的连续机器人互动规划中学习深海符号生成和规则 (DeepSym: Deep Symbol Generation and Rule Learning from Unsupervised Continuous Robot Interaction for Planning)

We propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoder-decoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as `rollable' and `insertable'. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position.

翻译：我们提出了一个新颖的一般方法, 找到基于行动、离散的物体和效果的分类, 并为非三角性行动规划建立对之的概率规则。我们的机器人使用最初的动作序列与对象互动, 最初的动作序列假定会提前获得, 并观察它可以在环境中产生的效果。要形成基于行动的物体、效果和关联的分类, 我们使用一个二进制的瓶颈层, 将场景和动作应用的图像作为输入, 并在像素坐标位置上产生结果效果。学习后, 二进制潜在矢量矢量矢量根据机器人的互动经验, 与对象进行活动驱动对象类别。要将神经网络所代表的知识转化为对象征性推理有用的规则, 决策树可以复制其生成的解析功能。概率规则是从树的预测路径中提取出来, 并且代表着一个具有概率性的方法( PPDDDL) 语言, 允许离位规划者在像机的图像位置位置位置位置上运行从感官机机的图像定位目标, 的模型的模型模型模型模型模型模型的模型的模型的模型的模型的模型的模型的模型的模型, 的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型, 的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型。