Imitation Learning algorithms learn a policy from demonstrations of expert behavior. Somewhat counterintuitively, we show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning, which is commonly considered more difficult. We conduct experiments which confirm that our reduction works well in practice for a continuous control task.
翻译:模拟学习算法从专家行为的表现中学习了一种政策。 与直觉相反的是,我们证明,对于决定性的专家来说,模仿学习可以通过减少到强化学习来完成,这通常被认为是更困难的。 我们进行实验,证实我们的减少在持续控制任务的实际操作中效果良好。