Although reinforcement learning has found widespread use in dense reward settings, training autonomous agents with sparse rewards remains challenging. To address this difficulty, prior work has shown promising results when using not only task-specific demonstrations but also task-agnostic albeit somewhat related demonstrations. In most cases, the available demonstrations are distilled into an implicit prior, commonly represented via a single deep net. Explicit priors in the form of a database that can be queried have also been shown to lead to encouraging results. To better benefit from available demonstrations, we develop a method to Combine Explicit and Implicit Priors (CEIP). CEIP exploits multiple implicit priors in the form of normalizing flows in parallel to form a single complex prior. Moreover, CEIP uses an effective explicit retrieval and push-forward mechanism to condition the implicit priors. In three challenging environments, we find the proposed CEIP method to improve upon sophisticated state-of-the-art techniques.
翻译:尽管强化学习在密集的奖励环境中得到了广泛使用,但培训自主人员,给予微薄的奖赏,仍然具有挑战性。为解决这一困难,先前的工作不仅在使用特定任务示范,而且在使用任务不可知性但有些关联的示范时,都显示出了有希望的成果。在多数情况下,现有的示范被蒸发成隐含的先行,通常通过单一的深网进行。可以查询的数据库形式的明确前导也表明可以产生令人鼓舞的结果。为了从现有的示范中更好地获益,我们制定了一种将公开和隐含的先行(CEIP)相结合的方法。CEIP利用了多种隐含的先行,使流动正常化,同时形成一个单一的先行。此外,CEIP使用了一种有效的明确检索和推进机制,为隐含的先行提供条件。在三个具有挑战性的环境中,我们发现拟议的CEIP方法可以改进尖端技术。