The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
翻译:论文介绍了DisproD,这是为连续状态和行动空间中概率过渡环境开发的在线规划师DisproD。DisproD制作了一个象征性图表,根据一项特定政策,利用独立假设和分布分布的近似传播,记录未来轨迹的分布情况。该符号图提供了该政策价值的不同表述,使基于梯度的高效优化能够用于长方位搜索。大致分布的传播可视为许多轨迹的集合,使之适合于处理稀少的奖赏和随机环境。一项广泛的实验性评价将DisProD比作在离散时间规划和机器人系统的实时控制方面最先进的规划者。拟议方法在处理随机环境、对搜索深度的敏感度、奖赏的宽度和大型行动空间方面比现有的规划者有所改进。其他现实世界实验表明DisProD能够控制地面车辆和地面船只在障碍周围顺利航行。</s>