Effective and intelligent exploration has been an unresolved problem for reinforcement learning. Most contemporary reinforcement learning relies on simple heuristic strategies such as $\epsilon$-greedy exploration or adding Gaussian noise to actions. These heuristics, however, are unable to intelligently distinguish the well explored and the unexplored regions of state space, which can lead to inefficient use of training time. We introduce entropy-based exploration (EBE) that enables an agent to explore efficiently the unexplored regions of state space. EBE quantifies the agent's learning in a state using merely state-dependent action values and adaptively explores the state space, i.e. more exploration for the unexplored region of the state space. We perform experiments on a diverse set of environments and demonstrate that EBE enables efficient exploration that ultimately results in faster learning without having to tune any hyperparameter. The code to reproduce the experiments is given at \url{https://github.com/Usama1002/EBE-Exploration} and the supplementary video is given at \url{https://youtu.be/nJggIjjzKic}.
翻译:有效而明智的探索一直是强化学习的一个未决问题。 大部分当代的强化学习都依赖于简单的超自然战略, 如$\ epsilon$- greedy 探索或将高山噪音添加到行动上。 但是,这些超自然学无法明智地区分探索周密和未探索的州空间区域, 从而导致培训时间的使用效率低下。 我们引入了基于 entropy 的探索( EBE ), 使代理人能够有效探索未探索的州空间区域。 EBE 量化了该代理人在州内的学习, 仅使用国家独立的行动值, 并适应性地探索州空间, 即更多探索未探索的州空间。 我们在一系列不同的环境上进行实验, 并证明 EBE 能够有效探索, 最终导致更快的学习, 而无需调整任何超参数 。 复制实验的代码在 url{https://github.com/ Usama 1002/EE- ExplorationJorg} 和补充视频在\ rqiqual{K_ K. {K_ be_ nggsqual_ 。