Identifying uncertainty and taking mitigating actions is crucial for safe and trustworthy reinforcement learning agents, especially when deployed in high-risk environments. In this paper, risk sensitivity is promoted in a model-based reinforcement learning algorithm by exploiting the ability of a bootstrap ensemble of dynamics models to estimate environment epistemic uncertainty. We propose uncertainty guided cross-entropy method planning, which penalises action sequences that result in high variance state predictions during model rollouts, guiding the agent to known areas of the state space with low uncertainty. Experiments display the ability for the agent to identify uncertain regions of the state space during planning and to take actions that maintain the agent within high confidence areas, without the requirement of explicit constraints. The result is a reduction in the performance in terms of attaining reward, displaying a trade-off between risk and return.
翻译:确定不确定性和采取缓解行动对于安全和值得信赖的强化学习机构至关重要,特别是在高风险环境中部署时。本文通过利用一组动态模型的能力来估计环境隐性不确定性,在基于模型的强化学习算法中提高了对风险的敏感性。我们提出了以不确定性为指南的跨渗透性方法规划,该方法规划惩罚在模型推出期间导致高差异状态预测的行动序列,引导该代理人前往已知的州空间地区,而不确定性较低。实验表明该代理人有能力在规划期间查明州空间的不确定区域,并采取行动在不要求明确限制的情况下在高度信任地区维持该代理人。其结果是在获得奖励方面的绩效下降,在风险和回报之间表现出平衡。