In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.
翻译:本文提出了一种覆盖规划控制方法,允许带有可控传感器(如相机)的移动代理覆盖感兴趣物体的表面区域(即有限感知范围和视野角度)。所提出的方法将射线追踪集成到覆盖规划过程中,从而使代理能够确定任何时刻场景的哪些部分是可见的。首先将集成射线追踪和覆盖规划控制问题制定为约束最优控制问题(OCP),旨在确定代理在有限的规划时段内的最优控制输入,以最小化覆盖时间。由于非凸和非线性的可见性约束条件,高效解决所得到的OCP是非常具有挑战性的。为克服此限制,将问题转换为马尔可夫决策过程(MDP),然后使用强化学习进行解决。特别地,我们展示了一个遵循最优控制律的控制器,可以使用离线策略时间差控制(即Q学习)进行学习。广泛的数值实验证明了所提出方法对于代理和感兴趣的物体的各种配置的有效性。