We investigate the problem of risk averse robot path planning using the deep reinforcement learning and distributionally robust optimization perspectives. Our problem formulation involves modelling the robot as a stochastic linear dynamical system, assuming that a collection of process noise samples is available. We cast the risk averse motion planning problem as a Markov decision process and propose a continuous reward function design that explicitly takes into account the risk of collision with obstacles while encouraging the robot's motion towards the goal. We learn the risk-averse robot control actions through Lipschitz approximated Wasserstein distributionally robust deep Q-learning to hedge against the noise uncertainty. The learned control actions result in a safe and risk averse trajectory from the source to the goal, avoiding all the obstacles. Various supporting numerical simulations are presented to demonstrate our proposed approach.
翻译:我们利用深度强化学习和分布式强力优化视角来调查规避风险的机器人路径规划问题。我们的问题配方涉及将机器人模拟成一个随机直线动态系统,假设可以收集到一个过程噪音样本。我们把风险反向运动规划问题作为Markov的决策过程投放风险反向运动规划问题,并提议一个连续的奖励功能设计,明确考虑到与障碍碰撞的风险,同时鼓励机器人朝着目标前进。我们通过Lipschitz 了解了风险反风险机器人控制行动,通过Lipschitz 近似Wasserstein 的分布式强力深度Q-学习来防范噪音不确定性。所学的控制行动导致从源到目标的安全和风险逆向轨迹,避免所有障碍。我们提出了各种支持性的数字模拟,以展示我们拟议的方法。