从文本中提取受害人数</s> (Extracting Victim Counts from Text)

Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.

翻译：人道主义部门的决策者在危机事件中依赖及时和准确的信息。知道地震期间有多少平民受伤,对适当分配援助至关重要。关于受害者人数的信息通常只能在报纸和其他报告中的全文事件描述中提供。从文本中提取数字具有挑战性:数字有不同格式,可能需要数字推理。这使得纯粹串式匹配方法不够充分。因此,伤害、流离失所或受虐待的受害者超过死亡人数的细微计数往往不会被提取出来,而且仍然无法被人们所了解。我们把受害者计数作为回答问题(QA)的任务,并设定回归或分类目标。我们比较regex、依赖分析、语义作用标签方法和先进的文本到文本模型。除了模型精度外,我们分析提取可靠性和稳健性,这是这一敏感任务的关键。特别是,我们讨论模型校准和调查少发和超出分配的绩效。最后,我们提出一个全面建议,为不同的分界和数据领域选择何种模式。我们的工作是首先在现实世界应用注重数学的大型语言模型,并产生积极影响。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/