Scenarios requiring humans to choose from multiple seemingly optimal actions are commonplace, however standard imitation learning often fails to capture this behavior. Instead, an over-reliance on replicating expert actions induces inflexible and unstable policies, leading to poor generalizability in an application. To address the problem, this paper presents the first imitation learning framework that incorporates Bayesian variational inference for learning flexible non-parametric multi-action policies, while simultaneously robustifying the policies against sources of error, by introducing and optimizing disturbances to create a richer demonstration dataset. This combinatorial approach forces the policy to adapt to challenging situations, enabling stable multi-action policies to be learned efficiently. The effectiveness of our proposed method is evaluated through simulations and real-robot experiments for a table-sweep task using the UR3 6-DOF robotic arm. Results show that, through improved flexibility and robustness, the learning performance and control safety are better than comparison methods.
翻译:要求人类从多种看似最佳的行动中作出选择的情景是常见的,但标准的模仿学习往往未能捕捉到这种行为。相反,过度依赖复制专家行动会导致不灵活和不稳定的政策,导致应用中缺乏一般性。为了解决问题,本文件介绍了第一个模拟学习框架,其中纳入了巴伊西亚变异推法,用于学习灵活的非参数多动作政策,同时通过引入和优化干扰来强化政策,防止误差源,从而创建更丰富的示范数据集。这种组合式方法迫使该政策适应挑战性情况,使稳定的多动作政策能够有效学习。我们拟议方法的有效性通过模拟和实机器人实验来评估,以便利用UR3 6-DOF机器人臂进行表格清理任务。结果显示,通过提高灵活性和稳健度,学习性能和控制安全比比较方法要好。