Imitation learning demonstrates remarkable performance in various domains. However, imitation learning is also constrained by many prerequisites. The research community has done intensive research to alleviate these constraints, such as adding the stochastic policy to avoid unseen states, eliminating the need for action labels, and learning from the suboptimal demonstrations. Inspired by the natural reproduction process, we proposed a method called GenIL that integrates the Genetic Algorithm with imitation learning. The involvement of the Genetic Algorithm improves the data efficiency by reproducing trajectories with various returns and assists the model in estimating more accurate and compact reward function parameters. We tested GenIL in both Atari and Mujoco domains, and the result shows that it successfully outperforms the previous extrapolation methods over extrapolation accuracy, robustness, and overall policy performance when input data is limited.
翻译:模拟学习显示了各个领域的杰出表现。 但是,模仿学习也受到许多先决条件的限制。 研究界已经为缓解这些制约因素进行了密集研究,例如增加随机政策以避免隐蔽状态,消除行动标签的需要,以及从次优的演示中学习。在自然复制过程的启发下,我们提出了一个名为GenIL的方法,将遗传地算法与模仿学习结合起来。遗传地算法的介入通过产生具有各种回报的轨迹来提高数据效率,并协助模型估算更准确和紧凑的奖励功能参数。 我们在Atari和Mujoco领域测试了GenIL,结果显示,在投入数据有限的情况下,它成功地超越了先前的外推法的准确性、稳健性和总体政策性能。