Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into longrange, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in highprecision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations. The supplementary video is available at https://www.youtube.com/watch?v=WAr8ZY3mYyw
翻译:最近,在类级操作中取得了令人乐观的成果,这种操作对各种物体的情况进行了普遍化,但往往需要花费昂贵的真实世界数据收集和每个对象类别和任务的语义关键点的手工规格。此外,粗糙的关键点预测和忽视中间行动序列阻碍了采用超越选址和地点的复杂操作任务。这项工作提出了一个新型的类别级操作框架,利用一个以物体为中心的、类别一级代表制和无模型的6 DoF运动跟踪工具。光学对象代表制仅通过模拟学习,然后用于从单一演示视频中分析一个分类级、任务轨迹。演示被重新预测成一个目标轨迹,通过Canconomical 代表制代表制为一个新对象。在执行过程中,操纵地平面被分解成长距离、无碰撞运动和最后一页操纵。对于后一部分来说,一个类别级行为克隆(CatBC)方法利用运动跟踪来进行封闭式操作控制。CatBC遵循目标轨迹,从演示中预测并定位到动态选择的类别级目标轨迹,从一个从单一的日历级目标轨迹,在复杂的分类协调框框中进行。这个框架可以自动选择一个在系统上进行操作。这个框架上展示一个具有挑战性的流程,通过演示的流程,通过不同的操作,通过演示式的流程,通过一个演示式的流程,通过一个演示式的流程将一个演示式的流程将一个演示式的流程将一个演示式的轨道,将一个演示式的系统进行到一个演示式操作,将一个演示式的流程,将一个演示式操作法系。