In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.
翻译:在这项工作中,我们建设了最近在分配强化学习方面的进步,为基于IQN的模型提供了一个最先进的分配变体。我们通过使用GAN模型的生成器和带有四分位回归作用的区别函数来达到这一点,以接近国家行动回报分布的四分位值。我们展示了我们基线数据集的改进性能 - 57 Atari 2600游戏在ALE中的功能。此外,我们利用我们的算法来展示阿塔里游戏中风险敏感政策的最新培训性能,同时进行政策优化和评估。