Imitation learning (IL) algorithms typically distil experience into parametric behavior policies to mimic expert demonstrations. With limited experience previous methods often struggle and cannot accurately align the current state with expert demonstrations, particularly in tasks that are characterised by partial observations or dynamic object deformations. We consider imitation learning in deformable mobile manipulation with an ego-centric limited field of view and introduce a novel IL approach called DeMoBot that directly retrieves observations from demonstrations. DeMoBot utilizes vision foundation models to identify relevant expert data based on visual similarity and matches the current trajectory with demonstrated trajectories using trajectory similarity and forward reachability constraints to select suitable sub-goals. A goal-conditioned motion generation policy shall guide the robot to the sub-goal until the task is completed. We evaluate DeMoBot using a Spot robot in several simulated and real-world settings, demonstrating its effectiveness and generalizability. DeMoBot outperforms baselines with only 20 demonstrations, attaining high success rates in gap covering (85% simulation, 80% real-world) and table uncovering (87.5% simulation, 70% real-world), while showing promise in complex tasks like curtain opening (47.5% simulation, 35% real-world). Additional details are available at: https://sites.google.com/view/demobot-fewshot/home
翻译:暂无翻译