In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.
翻译:近年来,强化学习有许多深层结构,主要用于价值功能估计和表述,这些方法在Atari 2600域取得了巨大成功。在本文中,我们建议根据裁量网络改进结构,在这个架构中,有两个独立的估计器,一个接近国家价值功能,另一个接近国家优势功能。 这种基于最大信封的改进表明,与阿塔里域的原始网络和其他基于价值的结构相比,政策评价更好。