Cooperative trajectory planning methods for automated vehicles, are capable to solve traffic scenarios that require a high degree of cooperation between traffic participants. In order for cooperative systems to integrate in human-centered traffic, it is important that the automated systems behave human-like, so that humans can anticipate the system's decisions. While Reinforcement Learning has made remarkable progress in solving the decision making part, it is non-trivial to parameterize a reward model that yields predictable actions. This work employs feature-based Maximum Entropy Inverse Reinforcement Learning in combination with Monte Carlo Tree Search to learn reward models that maximizes the likelihood of recorded multi-agent cooperative expert trajectories. The evaluation demonstrates that the approach is capable of recovering a reasonable reward model that mimics the expert and performs similar to a manually tuned baseline reward model.
翻译:自动车辆合作轨迹规划方法能够解决交通事故情况,需要交通参与者之间高度合作。为使合作系统融入以人为中心的交通,自动化系统必须像人一样行事,以便人类能够预测系统的决定。虽然强化学习在解决决策部分方面取得了显著进展,但将一个奖励模式参数化是非三重性的,它会产生可预测的行动。这项工作与蒙特卡洛树搜索公司一起,采用基于地物的最大反向强化学习,学习奖励模式,最大限度地提高多剂合作专家记录轨迹的可能性。评价表明,这种方法能够恢复一个合理的奖励模式,模仿专家,并进行类似于人工调整基线奖励模式的工作。