实现深入强化学习的可调整核查 (Towards Scalable Verification of Deep Reinforcement Learning)

Deep neural networks (DNNs) have gained significant popularity in recent years, becoming the state of the art in a variety of domains. In particular, deep reinforcement learning (DRL) has recently been employed to train DNNs that realize control policies for various types of real-world systems. In this work, we present the whiRL 2.0 tool, which implements a new approach for verifying complex properties of interest for DRL systems. To demonstrate the benefits of whiRL 2.0, we apply it to case studies from the communication networks domain that have recently been used to motivate formal verification of DRL systems, and which exhibit characteristics that are conducive for scalable verification. We propose techniques for performing k-induction and semi-automated invariant inference on such systems, and leverage these techniques for proving safety and liveness properties that were previously impossible to verify due to the scalability barriers of prior approaches. Furthermore, we show how our proposed techniques provide insights into the inner workings and the generalizability of DRL systems. whiRL 2.0 is publicly available online.

翻译：近年来,深神经网络(DNNS)已获得显著支持,成为各个领域的最新先进技术,特别是,最近利用深强化学习(DRL)来培训能够实现各类现实世界系统控制政策的DNNS。在这项工作中,我们展示了THERL 2.0工具,该工具采用新方法来核查DRL系统感兴趣的复杂特性。为了展示WHERL 2.0的好处,我们将其应用于通信网络领域的案例研究,这些案例研究最近被用来鼓励对DRL系统进行正式核查,并展示了有利于可扩展核查的特征。我们提出了对此类系统进行 k-上传和半自动变异推论的技术,并利用这些技术来证明以前由于先前方法的可扩缩性障碍而无法核实的安全和活性特性。此外,我们展示了我们所提议的技术如何提供对DRL系统内部操作和可普及性的洞察力。THER 2.0在网上公开提供。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日