This paper proposes a method for learning continuous control policies for active landmark localization and exploration using an information-theoretic cost. We consider a mobile robot detecting landmarks within a limited sensing range, and tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations. We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy. The approach is further unified with active volumetric mapping to promote exploration in addition to landmark localization. The performance is demonstrated in several simulated landmark localization tasks in comparison with benchmark methods.
翻译:本文建议了一种方法,利用信息理论成本来学习持续控制政策,用于积极地标定位和探索。我们考虑使用移动机器人,在有限的遥感范围内探测地标,并解决学习控制政策的问题,使地标状态和传感器观测之间的相互信息最大化。我们使用卡尔曼过滤器,将地标状态中部分可见的问题转换为Markov决策程序(MDP),这是一个不同的视野领域,以塑造奖赏,并建立一个关注神经网络来代表控制政策。这个方法与积极的体积测绘进一步统一,除了地标位置化之外,还促进勘探。与基准方法相比,在几项模拟地标定位任务中显示了这一绩效。