Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world, with broad applications in Robotics, especially target-driven navigation. This task requires the robot to find an object of a certain category efficiently in an unknown domestic environment. Recent works focus on exploiting layout relationships by graph neural networks (GNNs). However, most of them obtain robot actions directly from observations in an end-to-end manner via an incomplete relation graph, which is not interpretable and reliable. We decouple this task and propose ReVoLT, a hierarchical framework: (a) an object detection visual front-end, (b) a high-level reasoner (infers semantic sub-goals), (c) an intermediate-level planner (computes geometrical positions), and (d) a low-level controller (executes actions). ReVoLT operates with a multi-layer semantic-spatial topological graph. The reasoner uses multiform structured relations as priors, which are obtained from combinatorial relation extraction networks composed of unsupervised GraphSAGE, GCN, and GraphRNN-based Region Rollout. The reasoner performs with Upper Confidence Bound for Tree (UCT) to infer semantic sub-goals, accounting for trade-offs between exploitation (depth-first searching) and exploration (regretting). The lightweight intermediate-level planner generates instantaneous spatial sub-goal locations via an online constructed Voronoi local graph. The simulation experiments demonstrate that our framework achieves better performance in the target-driven navigation tasks and generalizes well, which has an 80% improvement compared to the existing state-of-the-art method. The code and result video will be released at https://ventusff.github.io/ReVoLT-website/.
翻译:缩略图是一个不可避免的趋势,它强调智能实体与真实世界之间的互动,在机器人系统中应用广泛,特别是目标驱动的导航。 这项任务要求机器人在未知的国内环境中找到一个特定类别的对象。 最近的工作重点是通过图形神经网络( GNN) 开发布局关系。 然而, 大部分它们通过一个不完全的关系图, 以端对端方式从观测直接获得机器人动作, 后者不易解释和可靠。 我们分解了这项任务, 并提出了一个等级框架 ReVoLT : (a) 目标检测前端, 特别是目标驱动导航。 (b) 高层次的定位( 指语义子变异次目标 ), (c) 中级规划员( 配置了地理勘探定位位置位置位置位置位置), (d) 低层次控制员( 执行动作动作动作) 。 ReVoLT 运行多层次的语义- 空间表图图。 理智者使用多级结构关系作为前级的演示, 这是来自由不精确的直径直径的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直路路路路径直径直径直径直径直径直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路由。 。