Timely and accurate situational reports are essential for humanitarian decision-making, yet current workflows remain largely manual, resource intensive, and inconsistent. We present a fully automated framework that uses large language models (LLMs) to transform heterogeneous humanitarian documents into structured and evidence-grounded reports. The system integrates semantic text clustering, automatic question generation, retrieval augmented answer extraction with citations, multi-level summarization, and executive summary generation, supported by internal evaluation metrics that emulate expert reasoning. We evaluated the framework across 13 humanitarian events, including natural disasters and conflicts, using more than 1,100 documents from verified sources such as ReliefWeb. The generated questions achieved 84.7 percent relevance, 84.0 percent importance, and 76.4 percent urgency. The extracted answers reached 86.3 percent relevance, with citation precision and recall both exceeding 76 percent. Agreement between human and LLM based evaluations surpassed an F1 score of 0.80. Comparative analysis shows that the proposed framework produces reports that are more structured, interpretable, and actionable than existing baselines. By combining LLM reasoning with transparent citation linking and multi-level evaluation, this study demonstrates that generative AI can autonomously produce accurate, verifiable, and operationally useful humanitarian situation reports.
翻译:及时准确的情势报告对于人道主义决策至关重要,然而当前工作流程仍主要依赖人工,资源消耗大且缺乏一致性。本文提出一个全自动框架,利用大语言模型将异构的人道主义文档转化为结构化的、基于证据的态势报告。该系统集成了语义文本聚类、自动问题生成、基于检索增强的引文标注答案提取、多层级摘要生成以及执行摘要生成等功能,并辅以模拟专家推理的内部评估指标。我们使用来自ReliefWeb等权威机构的1100余份文档,在包含自然灾害与冲突事件在内的13类人道主义事件上对该框架进行了评估。生成的问题在相关性、重要性和紧急性三个维度上分别达到84.7%、84.0%和76.4%的评分;提取的答案相关性达86.3%,引文精确率与召回率均超过76%。人工评估与基于大语言模型的评估结果一致性F1分数超过0.80。对比分析表明,相较于现有基线方法,本框架生成的报告具有更优的结构化程度、可解释性与可操作性。通过将大语言模型推理能力与透明引文链接及多层级评估机制相结合,本研究表明生成式人工智能能够自主产出准确、可验证且具有实操价值的人道主义态势报告。