We investigate pneumatic non-prehensile manipulation (i.e., blowing) as a means of efficiently moving scattered objects into a target receptacle. Due to the chaotic nature of aerodynamic forces, a blowing controller must (i) continually adapt to unexpected changes from its actions, (ii) maintain fine-grained control, since the slightest misstep can result in large unintended consequences (e.g., scatter objects already in a pile), and (iii) infer long-range plans (e.g., move the robot to strategic blowing locations). We tackle these challenges in the context of deep reinforcement learning, introducing a multi-frequency version of the spatial action maps framework. This allows for efficient learning of vision-based policies that effectively combine high-level planning and low-level closed-loop control for dynamic mobile manipulation. Experiments show that our system learns efficient behaviors for the task, demonstrating in particular that blowing achieves better downstream performance than pushing, and that our policies improve performance over baselines. Moreover, we show that our system naturally encourages emergent specialization between the different subpolicies spanning low-level fine-grained control and high-level planning. On a real mobile robot equipped with a miniature air blower, we show that our simulation-trained policies transfer well to a real environment and can generalize to novel objects.
翻译:由于空气动力力量的混乱性质,吹风控制器必须(一) 不断适应其行动带来的意外变化,(二) 保持细微的控制,因为一小步的错误可能会造成意想不到的大规模后果(例如,在堆积中散布物体),(三) 推远远程计划(例如,将机器人移到战略吹风地点),我们在深层强化学习的背景下应对这些挑战,引入多频版的空间行动地图框架,从而能够高效地学习基于愿景的政策,从而有效地将高层次规划和低层次的闭路控制结合起来,用于动态移动操纵。实验表明,我们的系统学习了高效的任务行为,特别表明吹风比推力更能下游,我们的政策提高了基线的性能。此外,我们表明,我们的系统自然会鼓励在跨越低层次精密物体的不同次政策之间形成新的专业化,引入多频率的空间行动地图框架。这样可以高效地学习基于愿景的政策,有效地将高层次的规划与动态移动操作的低层次的闭路控制有效结合起来。实验表明,我们的系统能够真实地将我们拥有新的移动式的空气模拟环境转变为一个真实的、经过升级的模拟环境。