Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.
翻译:人道主义部门的决策者在危机事件中依赖及时和准确的信息。知道地震期间有多少平民受伤,对适当分配援助至关重要。关于受害者人数的信息通常只能在报纸和其他报告中的全文事件描述中提供。从文本中提取数字具有挑战性:数字有不同格式,可能需要数字推理。这使得纯粹串式匹配方法不够充分。因此,伤害、流离失所或受虐待的受害者超过死亡人数的细微计数往往不会被提取出来,而且仍然无法被人们所了解。我们把受害者计数作为回答问题(QA)的任务,并设定回归或分类目标。我们比较regex、依赖分析、语义作用标签方法和先进的文本到文本模型。除了模型精度外,我们分析提取可靠性和稳健性,这是这一敏感任务的关键。特别是,我们讨论模型校准和调查少发和超出分配的绩效。最后,我们提出一个全面建议,为不同的分界和数据领域选择何种模式。我们的工作是首先在现实世界应用注重数学的大型语言模型,并产生积极影响。</s>