Path planning in dynamic environments is a fundamental challenge in intelligent transportation and robotics, where obstacles and conditions change over time, introducing uncertainty and requiring continuous adaptation. While existing approaches often assume complete environmental unpredictability or rely on global planners, these assumptions limit scalability and practical deployment in real-world settings. In this paper, we propose a scalable, region-aware reinforcement learning (RL) framework for path planning in dynamic environments. Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions. To exploit this, we introduce a hierarchical decomposition of the environment and deploy distributed RL agents that adapt to changes locally. We further propose a retraining mechanism based on sub-environment success rates to determine when policy updates are necessary. Two training paradigms are explored: single-agent Q-learning and multi-agent federated Q-learning, where local Q-tables are aggregated periodically to accelerate the learning process. Unlike prior work, we evaluate our methods in more realistic settings, where multiple simultaneous obstacle changes and increasing difficulty levels are present. Results show that the federated variants consistently outperform their single-agent counterparts and closely approach the performance of A* Oracle while maintaining shorter adaptation times and robust scalability. Although initial training remains time-consuming in large environments, our decentralized framework eliminates the need for a global planner and lays the groundwork for future improvements using deep RL and flexible environment decomposition.
翻译:动态环境中的路径规划是智能交通与机器人学中的基础性挑战,其中障碍物与环境条件随时间变化,引入了不确定性并需要持续适应。现有方法通常假设环境完全不可预测或依赖全局规划器,这些假设限制了实际场景中的可扩展性与部署可行性。本文提出一种可扩展的、区域感知的强化学习框架,用于动态环境中的路径规划。我们的方法基于以下观察:环境变化虽具动态性,但常局限于有界区域内。为利用这一特性,我们引入了环境的分层分解,并部署分布式强化学习智能体以局部适应变化。进一步提出基于子环境成功率的再训练机制,以确定策略更新的时机。我们探索了两种训练范式:单智能体Q学习与多智能体联邦Q学习,其中局部Q表定期聚合以加速学习过程。与先前研究不同,我们在更贴近现实的设置中评估方法,包括多障碍物同时变化及难度递增的场景。结果表明,联邦式变体始终优于单智能体版本,并在保持更短适应时间与强健可扩展性的同时,其性能接近A* Oracle基准。尽管在大型环境中初始训练仍较耗时,我们的去中心化框架无需全局规划器,并为未来结合深度强化学习与灵活环境分解的改进奠定了基础。