We propose a novel benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. Aquatic navigation is an extremely challenging task due to the non-stationary environment and the uncertainties of the robotic platform, hence it is crucial to consider the safety aspect of the problem, by analyzing the behavior of the trained network to avoid dangerous situations (e.g., collisions). To this end, we consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) and we propose a crossover-based strategy that combines gradient-based and gradient-free DRL to improve sample-efficiency. Moreover, we propose a verification strategy based on interval analysis that checks the behavior of the trained models over a set of desired properties. Our results show that the crossover-based training outperforms prior DRL approaches, while our verification allows us to quantify the number of configurations that violate the behaviors that are described by the properties. Crucially, this will serve as a benchmark for future research in this domain of applications.
翻译:水上导航是一项极具挑战性的任务,原因是非静止环境和机器人平台的不确定性,因此,至关重要的是要通过分析经过训练的网络的行为来分析问题的安全方面,以避免危险的情况(例如碰撞)。 为此,我们考虑一种基于价值和具有政策取向的深层强化学习(DRL),我们建议一种基于交叉的战略,将梯度和无梯度的DRL结合起来,以提高取样效率。此外,我们提议了一项基于间隙分析的核查战略,以检查经过训练的模型在一套预期特性方面的行为。我们的结果显示,跨边界培训比DRL之前的方法要好,而我们的核查使我们能够量化违反属性描述的行为的配置数量。 关键是,这将作为今后在应用领域进行研究的基准。