学习与对手对抗的轨迹游戏并处理未知目标 (Learning to Play Trajectory Games Against Opponents with Unknown Objectives) - 专知论文

会员服务 ·

0

INTERACT · 推断 · Learning · 易处理的 · 极大似然 ·

2023 年 3 月 22 日

Learning to Play Trajectory Games Against Opponents with Unknown Objectives

翻译：学习与对手对抗的轨迹游戏并处理未知目标

Xinjie Liu,Lasse Peters,Javier Alonso-Mora

Many autonomous agents, such as intelligent vehicles, are inherently required to interact with one another. Game theory provides a natural mathematical tool for robot motion planning in such interactive settings. However, tractable algorithms for such problems usually rely on a strong assumption, namely that the objectives of all players in the scene are known. To make such tools applicable for ego-centric planning with only local information, we propose an adaptive model-predictive game solver, which jointly infers other players' objectives online and computes a corresponding generalized Nash equilibrium (GNE) strategy. The adaptivity of our approach is enabled by a differentiable trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents' objectives. This differentiability of our pipeline facilitates direct integration with other differentiable elements, such as neural networks (NNs). Furthermore, in contrast to existing solvers for cost inference in games, our method handles not only partial state observations but also general inequality constraints. In two simulated traffic scenarios, we find superior performance of our approach over both existing game-theoretic methods and non-game-theoretic model-predictive control (MPC) approaches. We also demonstrate our approach's real-time planning capabilities and robustness in two hardware experiments.

翻译：许多自主代理体，例如智能车辆，本质上需要相互交互。博弈论为机器人在这种交互式环境中进行移动规划提供了一种自然的数学工具。然而，用于解决此类问题的可跟踪算法通常依赖于一个强假设，即所有参与者的目标都是已知的。为使这些工具适用于只具有局部信息的自我中心规划，我们提出了一种自适应模型预测游戏求解器，它在线上联合推断其他玩家的目标并计算相应的广义纳什均衡（GNE）策略。我们的方法的适应性是通过可微分轨迹博弈求解程序实现的，其梯度信号用于对对手目标的最大似然估计（MLE）。我们的流程的可微性有利于与其他可微元素（例如神经网络（NNs））的直接集成。此外，与用于博弈中成本推理的现有求解器不同，我们的方法不仅处理部分状态观察而且处理一般不等式约束。在两个模拟交通场景中，我们发现我们的方法优于现有的博弈理论方法和非博弈理论的模型预测控制（MPC）方法。我们还展示了我们的方法的实时规划能力和在两个硬件实验中的鲁棒性。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

专知会员服务

104+阅读 · 2022年9月21日

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

165+阅读 · 2022年4月10日

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

专知会员服务

18+阅读 · 2022年3月28日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

DAI2020 SMARTS 自动驾驶挑战赛(深度强化学习)

DAI2020 SMARTS 自动驾驶挑战赛(深度强化学习)

深度强化学习实验室

15+阅读 · 2020年8月15日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

Nrf2在砷暴露致胰岛β细胞内质网应激及细胞损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

静脉麻醉与电量交互改善抑郁症电休克治疗后学习记忆的LTP/LTD平衡机制

国家自然科学基金

0+阅读 · 2012年12月31日

盲环境中基于触觉信息的灵巧手实时抓取策略的研究

国家自然科学基金

1+阅读 · 2012年12月31日

最优传输问题与随机矩阵

国家自然科学基金

2+阅读 · 2012年12月31日

SWEETs家族基因在番茄果实糖转运与积累过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

维生素E琥珀酸酯诱导胃癌细胞凋亡过程中内质网应激与氧化应激的交互作用

国家自然科学基金

0+阅读 · 2011年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Fenton/絮凝耦合同步去除污水中重金属与有机物的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

约束优化问题的目标罚函数的精确性和算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

3D fictitious wave domain CSEM inversion by adjoint source estimation

Arxiv

0+阅读 · 2023年5月11日

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Arxiv

0+阅读 · 2023年5月11日

HoneyIoT: Adaptive High-Interaction Honeypot for IoT Devices Through Reinforcement Learning

Arxiv

0+阅读 · 2023年5月10日

Sequence-Agnostic Multi-Object Navigation

Arxiv

0+阅读 · 2023年5月10日

Bayesian variance change point detection with credible sets

Arxiv

0+阅读 · 2023年5月10日

DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors

Arxiv

0+阅读 · 2023年5月9日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

相关VIP内容

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

专知会员服务

104+阅读 · 2022年9月21日

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

165+阅读 · 2022年4月10日

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

专知会员服务

18+阅读 · 2022年3月28日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

DAI2020 SMARTS 自动驾驶挑战赛(深度强化学习)

DAI2020 SMARTS 自动驾驶挑战赛(深度强化学习)

深度强化学习实验室

15+阅读 · 2020年8月15日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

3D fictitious wave domain CSEM inversion by adjoint source estimation

Arxiv

0+阅读 · 2023年5月11日

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Arxiv

0+阅读 · 2023年5月11日

HoneyIoT: Adaptive High-Interaction Honeypot for IoT Devices Through Reinforcement Learning

Arxiv

0+阅读 · 2023年5月10日

Sequence-Agnostic Multi-Object Navigation

Arxiv

0+阅读 · 2023年5月10日

Bayesian variance change point detection with credible sets

Arxiv

0+阅读 · 2023年5月10日

DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors

Arxiv

0+阅读 · 2023年5月9日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

Nrf2在砷暴露致胰岛β细胞内质网应激及细胞损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

静脉麻醉与电量交互改善抑郁症电休克治疗后学习记忆的LTP/LTD平衡机制

国家自然科学基金

0+阅读 · 2012年12月31日

盲环境中基于触觉信息的灵巧手实时抓取策略的研究

国家自然科学基金

1+阅读 · 2012年12月31日

最优传输问题与随机矩阵

国家自然科学基金

2+阅读 · 2012年12月31日

SWEETs家族基因在番茄果实糖转运与积累过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

维生素E琥珀酸酯诱导胃癌细胞凋亡过程中内质网应激与氧化应激的交互作用

国家自然科学基金

0+阅读 · 2011年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Fenton/絮凝耦合同步去除污水中重金属与有机物的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

约束优化问题的目标罚函数的精确性和算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员