评价在多试剂城市推动环境中为自治和反对立政策进行深入强化学习的有力程度 (Evaluating the Robustness of Deep Reinforcement Learning for Autonomous and Adversarial Policies in a Multi-agent Urban Driving Environment)

Deep reinforcement learning is actively used for training autonomous and adversarial car policies in a simulated driving environment. Due to the large availability of various reinforcement learning algorithms and the lack of their systematic comparison across different driving scenarios, we are unsure of which ones are more effective for training and testing autonomous car software in single-agent as well as multi-agent driving environments. A benchmarking framework for the comparison of deep reinforcement learning in a vision-based autonomous driving will open up the possibilities for training better autonomous car driving policies. Furthermore, autonomous cars trained on deep reinforcement learning-based algorithms are known for being vulnerable to adversarial attacks. To guard against adversarial attacks, we can train autonomous cars on adversarial driving policies. However, we lack the knowledge of which deep reinforcement learning algorithms would act as good adversarial agents able to effectively test autonomous cars. To address these challenges, we provide an open and reusable benchmarking framework for systematic evaluation and comparative analysis of deep reinforcement learning algorithms for autonomous and adversarial driving in a single- and multi-agent environment. Using the framework, we perform a comparative study of five discrete and two continuous action space deep reinforcement learning algorithms. We run the experiments in a vision-only high fidelity urban driving simulated environments. The results indicate that only some of the deep reinforcement learning algorithms perform consistently better across single and multi-agent scenarios when trained in a multi-agent-only setting.

翻译：深度强化学习被积极用于模拟驾驶环境中的自主和对抗性汽车政策培训。由于大量存在各种强化学习算法,而且缺乏不同驾驶场景的系统比较,我们无法确定在单一试剂和多试剂驾驶环境中,哪些强化型汽车算法对培训和测试自主汽车软件更为有效。一个用于在基于愿景的自主驾驶中比较深度强化学习的基准框架将为培训更好的自主汽车驾驶政策开辟各种可能性。此外,在深强化型学习算法方面受过培训的自主汽车因易受敌对性攻击而众所周知。为了防范敌对性攻击,我们可以在对抗性驾驶政策上对自主汽车进行培训。然而,我们不知道哪些强化型汽车算法作为能够有效测试自主汽车的良好对抗性汽车。为了应对这些挑战,我们提供了一个开放和可重复使用的基准框架,用于系统评估和比较分析在单一和多试剂环境中对自主和对抗性驾驶的深度强化学习算法。我们用五个离散型汽车和两个连续动作进行空间深度强化学习算法进行了比较研究。然而,我们缺乏这方面的知识,我们缺乏关于哪些深度强化型强化式学习算法作为有效测试的好度的多式城市模拟模型环境的实验。我们只进行着一种高级的模拟模拟的模拟的模拟的多试算法。我们只是在进行着式的试制的试测。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。