Cloth folding is a widespread domestic task that is seemingly performed by humans but which is highly challenging for autonomous robots to execute due to the highly deformable nature of textiles; It is hard to engineer and learn manipulation pipelines to efficiently execute it. In this paper, we propose a new solution for robotic cloth folding (using a standard folding board) via learning from demonstrations. Our demonstration video encoding is based on a high-level abstraction, namely, a refined optical flow-based spatiotemporal graph, as opposed to a low-level encoding such as image pixels. By constructing a new spatiotemporal graph with an advanced visual corresponding descriptor, the policy learning can focus on key points and relations with a 3D spatial configuration, which allows to quickly generalize across different environments. To further boost the policy searching, we combine optical flow and static motion saliency maps to discriminate the dominant motions for better handling the system dynamics in real-time, which aligns with the attentional motion mechanism that dominates the human imitation process. To validate the proposed approach, we analyze the manual folding procedure and developed a custom-made end-effector to efficiently interact with the folding board. Multiple experiments on a real robotic platform were conducted to validate the effectiveness and robustness of the proposed method.
翻译:克隆折叠是一个广泛的国内任务,似乎由人来完成,但对于自主机器人来说,由于纺织品的高度变形性,它是一个巨大的挑战性任务; 很难设计并学习操纵管道,以高效地执行。 在本文件中,我们提出了一个机器人布折叠(使用标准折叠板)通过从演示中学习的新方法。 我们的演示视频编码是基于高层次的抽象,即精细的光学流动空间图,而不是像像像像素这样的低级编码。 通过建立一个带有高级视觉对应描述器的新的微时图,政策学习能够侧重于关键点和与3D空间配置的关系,从而能够快速地在不同的环境中推广。为了进一步推动政策搜索,我们将光学流动和静态运动显眼地图结合起来,以区别主导在实时更好地处理系统动态的主导性动议,而与控制人类模仿过程的注意运动机制相匹配。 为了验证拟议的方法,我们分析了手折叠程序,我们开发了与3D空间配置的基点和关系,从而能够快速推广不同环境。为了进一步推进政策搜索,我们把光学流和静态的显性显性显性显性显性动作图与模拟平台的模拟模型进行互动。