Distributional reinforcement learning demonstrates state-of-the-art performance in continuous and discrete control settings with the features of variance and risk, which can be used to explore. However, the exploration method employing the risk property is hard to find, although numerous exploration methods in Distributional RL employ the variance of return distribution per action. In this paper, we present risk scheduling approaches that explore risk levels and optimistic behaviors from a risk perspective. We demonstrate the performance enhancement of the DMIX algorithm using risk scheduling in a multi-agent setting with comprehensive experiments.
翻译:强化分布式学习显示,在连续和分散的控制环境中,具有差异和风险特点的先进性能,可用于探索,但是,使用风险财产的勘探方法很难找到,尽管分布式RL中的许多勘探方法采用了每个行动回报分布的差异。在本文件中,我们介绍了从风险角度探索风险水平和乐观行为的风险列表方法。我们用全面实验的多试剂环境中的风险列表,展示了DMIX算法的性能增强。