Curricula for goal-conditioned reinforcement learning agents typically rely on poor estimates of the agent's epistemic uncertainty or fail to consider the agents' epistemic uncertainty altogether, resulting in poor sample efficiency. We propose a novel algorithm, Query The Agent (QTA), which significantly improves sample efficiency by estimating the agent's epistemic uncertainty throughout the state space and setting goals in highly uncertain areas. Encouraging the agent to collect data in highly uncertain states allows the agent to improve its estimation of the value function rapidly. QTA utilizes a novel technique for estimating epistemic uncertainty, Predictive Uncertainty Networks (PUN), to allow QTA to assess the agent's uncertainty in all previously observed states. We demonstrate that QTA offers decisive sample efficiency improvements over preexisting methods.
翻译:目标条件强化学习剂的课程通常依赖对该剂的认知不确定性的低估计,或者完全不考虑该剂的认知不确定性,从而导致样本效率低下。我们建议采用新型算法,即查询代理(QTA),通过估计该剂在全州空间的认知不确定性和在高度不确定性地区设定目标,大大提高样本效率。鼓励该剂在高度不确定性的国家收集数据,使该剂能够迅速改进对价值函数的估计。QTA利用一种新颖的技术来估计认知不确定性,即预测不确定性网络(PUN),使QTA能够评估该剂在以往观察到的所有各州的不确定性。我们证明,QTA对先前存在的方法提供了决定性的样本效率改进。