Accurately manipulating articulated objects is a challenging yet important task for real robot applications. In this paper, we present a novel framework called Sim2Real$^2$ to enable the robot to manipulate an unseen articulated object to the desired state precisely in the real world with no human demonstrations. We leverage recent advances in physics simulation and learning-based perception to build the interactive explicit physics model of the object and use it to plan a long-horizon manipulation trajectory to accomplish the task. However, the interactive model cannot be correctly estimated from a static observation. Therefore, we learn to predict the object affordance from a single-frame point cloud, control the robot to actively interact with the object with a one-step action, and capture another point cloud. Further, the physics model is constructed from the two point clouds. Experimental results show that our framework achieves about 70% manipulations with <30% relative error for common articulated objects, and 30% manipulations for difficult objects. Our proposed framework also enables advanced manipulation strategies, such as manipulating with different tools. Code and videos are available on our project webpage: https://ttimelord.github.io/Sim2Real2-site/
翻译:精确操控表达的物体对于真正的机器人应用来说是一项具有挑战性但重要的任务。 在本文中, 我们提出了一个名为 Sim2Real$2$2的新框架, 以使机器人能够在没有人类演示的情况下, 在现实世界中将一个看不见的表达的物体操控到理想状态。 我们利用物理模拟和基于学习的感知的最新进展来构建该物体的交互式显性物理学模型, 并用它来规划一个长视距操纵轨迹来完成任务。 然而, 互动模型无法从静态观测中正确估算出来。 因此, 我们学会从一个单一框架点云中预测该物体的承受能力, 控制机器人与该物体进行一步骤动作的积极互动, 并捕捉另一个点云。 此外, 物理模型是从两个点云中构建的。 实验结果表明, 我们的框架实现了大约70%的操纵, 普通的显性物体的< 30% 相对误差, 以及困难物体的30% 。 我们提议的框架还能够实现先进的操纵策略, 例如用不同的工具操纵 。 代码和视频可以在我们的项目网页上 https://timelordard2/ reim2/ Real2 。