Learning from Demonstration (LfD) is a widely used technique for skill acquisition in robotics. However, demonstrations of the same skill may exhibit significant variances, or learning systems may attempt to acquire different means of the same skill simultaneously, making it challenging to encode these motions into movement primitives. To address these challenges, we propose an LfD framework, namely the Conditional Neural Expert Processes (CNEP), that learns to assign demonstrations from different modes to distinct expert networks utilizing the inherent information within the latent space to match experts with the encoded representations. CNEP does not require supervision on which mode the trajectories belong to. We compare the performance of CNEP against widely used and powerful LfD methods such as Gaussian Mixture Models, Probabilistic Movement Primitives, and Stable Movement Primitives and show that our method outperforms these baselines on multimodal trajectory datasets. The results reveal enhanced modeling performance for movement primitives, leading to the synthesis of trajectories that more accurately reflect those demonstrated by experts, particularly when the skill demonstrations include intersection points from various trajectories. We evaluated the CNEP model on two real-robot tasks, namely obstacle avoidance and pick-and-place tasks, that require the robot to learn multi-modal motion trajectories and execute the correct primitives given target environment conditions. We also showed that our system is capable of on-the-fly adaptation to environmental changes via an online conditioning mechanism. Lastly, we believe that CNEP offers improved explainability and interpretability by autonomously finding discrete behavior primitives and providing probability values about its expert selection decisions.
翻译:暂无翻译