水上航行安全深层强化学习基准基准 (Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation)

We propose a novel benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. Aquatic navigation is an extremely challenging task due to the non-stationary environment and the uncertainties of the robotic platform, hence it is crucial to consider the safety aspect of the problem, by analyzing the behavior of the trained network to avoid dangerous situations (e.g., collisions). To this end, we consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) and we propose a crossover-based strategy that combines gradient-based and gradient-free DRL to improve sample-efficiency. Moreover, we propose a verification strategy based on interval analysis that checks the behavior of the trained models over a set of desired properties. Our results show that the crossover-based training outperforms prior DRL approaches, while our verification allows us to quantify the number of configurations that violate the behaviors that are described by the properties. Crucially, this will serve as a benchmark for future research in this domain of applications.

翻译：水上导航是一项极具挑战性的任务,原因是非静止环境和机器人平台的不确定性,因此,至关重要的是要通过分析经过训练的网络的行为来分析问题的安全方面,以避免危险的情况(例如碰撞)。为此,我们考虑一种基于价值和具有政策取向的深层强化学习(DRL),我们建议一种基于交叉的战略,将梯度和无梯度的DRL结合起来,以提高取样效率。此外,我们提议了一项基于间隙分析的核查战略,以检查经过训练的模型在一套预期特性方面的行为。我们的结果显示,跨边界培训比DRL之前的方法要好,而我们的核查使我们能够量化违反属性描述的行为的配置数量。关键是,这将作为今后在应用领域进行研究的基准。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日