The use of Reinforcement Learning (RL) agents in practical applications requires the consideration of suboptimal outcomes, depending on the familiarity of the agent with its environment. This is especially important in safety-critical environments, where errors can lead to high costs or damage. In distributional RL, the risk-sensitivity can be controlled via different distortion measures of the estimated return distribution. However, these distortion functions require an estimate of the risk level, which is difficult to obtain and depends on the current state. In this work, we demonstrate the suboptimality of a static risk level estimation and propose a method to dynamically select risk levels at each environment step. Our method ARA (Automatic Risk Adaptation) estimates the appropriate risk level in both known and unknown environments using a Random Network Distillation error. We show reduced failure rates by up to a factor of 7 and improved generalization performance by up to 14% compared to both risk-aware and risk-agnostic agents in several locomotion environments.
翻译:在实际应用中使用强化学习(RL)代理物需要考虑亚最佳结果,这取决于该代理物对环境的熟悉程度。在安全临界环境中,这一点特别重要,因为错误可能导致高成本或损坏。在分布式RL中,风险敏感性可以通过对估计返回分布的不同扭曲措施加以控制。然而,这些扭曲功能需要估计风险水平,而风险水平难以获得,取决于当前状态。在这项工作中,我们展示了静态风险水平估计的亚最佳性,并提出了在每一个环境步骤动态选择风险水平的方法。我们的方法ARA(自动风险适应)使用随机网络蒸馏错误估计已知和未知环境中的适当风险水平。我们显示,在几个迷雾环境中,故障率降低到7倍,改进的普及性效绩达14%,而风险觉悟和风险敏感性物剂则同时提高14%。