Deep Reinforcement Learning (DRL) has achieved impressive performance in robotics and autonomous systems (RASs). A key impediment to its deployment in real-life operations is the spuriously unsafe DRL policies--unexplored states may lead the agent to make wrong decisions that may cause hazards, especially in applications where end-to-end controllers of the RAS were trained by DRL. In this paper, we propose a novel quantitative reliability assessment framework for DRL-controlled RASs, leveraging verification evidence generated from formal reliability analysis of neural networks. A two-level verification framework is introduced to check the safety property with respect to inaccurate observations that are due to, e.g., environmental noises and state changes. Reachability verification tools are leveraged at the local level to generate safety evidence of trajectories, while at the global level, we quantify the overall reliability as an aggregated metric of local safety evidence, according to an operational profile. The effectiveness of the proposed verification framework is demonstrated and validated via experiments on real RASs.
翻译:深度强化学习(DRL)在机器人和自主系统中取得了令人印象深刻的成绩。在实际操作中部署这种系统的一个关键障碍是,虚假不安全的DRL政策(未勘探国家)可能会导致代理人做出可能造成危害的错误决定,特别是在RAS端对端控制器接受DRL培训的应用中。在本文件中,我们提议为DRL控制的RAS提供一个新的定量可靠性评估框架,利用对神经网络进行的正式可靠性分析所产生的核查证据。引入了两级的核查框架,以检查由于环境噪音和状态变化等不准确的观察而导致的安全属性。在地方一级,利用可实现性核查工具来生成轨迹的安全证据,而在全球一级,我们根据操作概况,将总体可靠性量化为当地安全证据的综合指标。拟议核查框架的有效性通过实际RAS实验得到验证和验证。