Reinforcement Learning is a powerful tool to model decision-making processes. However, it relies on an exploration-exploitation trade-off that remains an open challenge for many tasks. In this work, we study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring. We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, ${\rho}$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49\% in terms of Eval Reward Return.
翻译:强化学习是模拟决策进程的有力工具。 但是,它依赖于勘探-开发交易,对于许多任务来说,这依然是一个公开的挑战。 在这项工作中,我们研究了以早期代理人为首的、以直觉为首的以州为基础的、无模式的探索,即对于早期代理人而言,考虑来自邻近各州交界地区的行动可能会在探索时导致更好的行动。 我们提出了两种算法,根据对邻近各州的调查选择探索行动,并发现我们的方法之一,即$_rho}-exlore,在Evalward Return Return 方面,持续超过在离散环境中的双QN基线49 。