Autonomous drone technology holds significant promise for enhancing search and rescue operations during evacuations by guiding humans toward safety and supporting broader emergency response efforts. However, their application in dynamic, real-time evacuation support remains limited. Existing models often overlook the psychological and emotional complexity of human behavior under extreme stress. In real-world fire scenarios, evacuees frequently deviate from designated safe routes due to panic and uncertainty. To address these challenges, this paper presents a multi-agent coordination framework in which autonomous Unmanned Aerial Vehicles (UAVs) assist human evacuees in real-time by locating, intercepting, and guiding them to safety under uncertain conditions. We model the problem as a Partially Observable Markov Decision Process (POMDP), where two heterogeneous UAV agents, a high-level rescuer (HLR) and a low-level rescuer (LLR), coordinate through shared observations and complementary capabilities. Human behavior is captured using an agent-based model grounded in empirical psychology, where panic dynamically affects decision-making and movement in response to environmental stimuli. The environment features stochastic fire spread, unknown evacuee locations, and limited visibility, requiring UAVs to plan over long horizons to search for humans and adapt in real-time. Our framework employs the Proximal Policy Optimization (PPO) algorithm with recurrent policies to enable robust decision-making in partially observable settings. Simulation results demonstrate that the UAV team can rapidly locate and intercept evacuees, significantly reducing the time required for them to reach safety compared to scenarios without UAV assistance.
翻译:自主无人机技术在增强疏散过程中的搜索与救援行动方面具有重要潜力,能够引导人员前往安全区域并支持更广泛的应急响应工作。然而,其在动态、实时的疏散支持中的应用仍较为有限。现有模型往往忽视了极端压力下人类行为的心理与情感复杂性。在实际火灾场景中,疏散者常因恐慌和不确定性而偏离指定的安全路线。为应对这些挑战,本文提出了一种多智能体协同框架,其中自主无人机在不确定条件下通过定位、拦截和引导人类疏散者,实时协助其安全撤离。我们将该问题建模为部分可观测马尔可夫决策过程,其中两个异构无人机智能体——高层救援者和低层救援者——通过共享观测和互补能力进行协同。人类行为采用基于实证心理学的智能体模型进行刻画,其中恐慌会动态影响其对环境刺激的决策与移动。环境具有随机火势蔓延、未知疏散者位置和有限能见度等特征,要求无人机进行长时程规划以搜索人类并实时适应。我们的框架采用近端策略优化算法结合循环策略,以实现部分可观测环境下的鲁棒决策。仿真结果表明,相较于无无人机协助的场景,无人机团队能够快速定位并拦截疏散者,显著缩短其到达安全区域所需的时间。