附有学习行为模型的树型政策规划 (Tree-structured Policy Planning with Learned Behavior Models) - 专知论文

会员服务 ·

0

回合 · Agent · INTERACT · Learning · MoDELS ·

2023 年 1 月 27 日

Tree-structured Policy Planning with Learned Behavior Models

翻译：附有学习行为模型的树型政策规划

Yuxiao Chen,Peter Karkus,Boris Ivanovic,Xinshuo Weng,Marco Pavone

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.

翻译：自主车辆(AVs)需要了解邻国代理人的多式联运行为。许多现有的轨道规划者在规划自身运动时需要了解邻国代理人的多式联运行为。许多现有轨迹规划者寻求一种单一轨迹,这种轨迹在\emph{all}合理的未来前景下运行良好,忽视双向互动,从而导致过度保守的计划。政策规划,即自我代理者计划一项对环境多式联运行为作出反应的政策,是一个充满希望的方向,因为它可以说明AV与环境之间的行动反应互动。然而,大多数现有政策规划者并没有达到真正的自主车辆应用的复杂性:它们要么与现代深层次的学习预测模型不兼容,不易解释,或者无法产生高质量的轨迹。为了填补这一差距,我们提议制定树政策规划,即自我代理者规划者规划出一个符合最新水平的深层学习预测模型的政策,制定多阶段运动计划,并解释利己者动力动力动力动力动力因素对环境行为的影响。 TPPP的关键想法是将持续优化的问题降低到一个可伸缩的 MDP,通过建造两个树结构:利利的自我测底轨迹轨迹轨迹,以显示自我测距的自我测距。

0

相关内容

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

反钙钛矿Mn3Ga1-xSixN1-yCy体系负热膨胀性质的第一性原理研究

国家自然科学基金

0+阅读 · 2015年12月31日

不确定条件下船舶通航风险的评价方法和演化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多重信息条件下基于个体行为的交通信息服务策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速列车主动粘着防滑可靠牵引与制动控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于云模型的分布式居民负荷谐波集体影响的评估方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑非点源污染影响的河流水污染控制规划研究

国家自然科学基金

0+阅读 · 2009年12月31日

IQGAP1相关信号通路在肺炎衣原体感染诱导血管平滑肌细胞迁移中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

大型复杂桥梁工程施工风险研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于线控制动的汽车动力学控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

Efficient Learning of High Level Plans from Play

Arxiv

0+阅读 · 2023年3月16日

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月16日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Multi-step planning with learned effects of (possibly partial) action executions

Arxiv

0+阅读 · 2023年3月16日

Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture

Arxiv

0+阅读 · 2023年3月15日

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月15日

Online and Predictive Warning System for Forced Lane Changes using Risk Maps

Arxiv

0+阅读 · 2023年3月15日

Chat with the Environment: Interactive Multimodal Perception using Large Language Models

Arxiv

0+阅读 · 2023年3月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Efficient Learning of High Level Plans from Play

Arxiv

0+阅读 · 2023年3月16日

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月16日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Multi-step planning with learned effects of (possibly partial) action executions

Arxiv

0+阅读 · 2023年3月16日

Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture

Arxiv

0+阅读 · 2023年3月15日

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月15日

Online and Predictive Warning System for Forced Lane Changes using Risk Maps

Arxiv

0+阅读 · 2023年3月15日

Chat with the Environment: Interactive Multimodal Perception using Large Language Models

Arxiv

0+阅读 · 2023年3月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

反钙钛矿Mn3Ga1-xSixN1-yCy体系负热膨胀性质的第一性原理研究

国家自然科学基金

0+阅读 · 2015年12月31日

不确定条件下船舶通航风险的评价方法和演化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多重信息条件下基于个体行为的交通信息服务策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

高速列车主动粘着防滑可靠牵引与制动控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于云模型的分布式居民负荷谐波集体影响的评估方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑非点源污染影响的河流水污染控制规划研究

国家自然科学基金

0+阅读 · 2009年12月31日

IQGAP1相关信号通路在肺炎衣原体感染诱导血管平滑肌细胞迁移中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

大型复杂桥梁工程施工风险研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于线控制动的汽车动力学控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员