This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.
翻译:本文侧重于利用远程和语义类观测进行自主导航的反强化学习。 目的是推断一个成本函数, 解释所显示的行为, 而仅依靠专家的观察和州控制轨迹。 我们开发了一个地图编码器, 从观测序列中推断出语义类别概率, 以及一个成本编码器, 定义为在语义特征上建立深神经网络。 由于专家成本无法直接观察, 模型参数只能通过区分显示的控制与根据成本估算计算的控制政策之间的错误来优化。 我们提出了一个新的专家行为模式, 使用封闭式子梯度, 只能通过运动规划算法对有希望的一组国家进行计算, 使错误最小化。 我们的方法允许将所学过的行为推广到新的环境, 并且对语义分类类别进行新的空间配置。 我们分析了我们模型在微型电网环境中的不同组成部分。 我们还表明, 我们的方法通过依靠建筑、 侧行行道和路道的语义观测, 来学习在自动驱动 CARLA 模拟器中遵循交通规则。