Social media has increasingly played a key role in emergency response: first responders can use public posts to better react to ongoing crisis events and deploy the necessary resources where they are most needed. Timeline extraction and abstractive summarization are critical technical tasks to leverage large numbers of social media posts about events. Unfortunately, there are few datasets for benchmarking technical approaches for those tasks. This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date. CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms. We built CrisisLTLSum using a semi-automated cluster-then-refine approach to collect data from the public Twitter stream. Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks. Our dataset, code, and models are publicly available.
翻译:社会媒体在应急反应中发挥了越来越重要的作用:第一反应者可以使用公共职位更好地应对当前的危机事件,并在最需要的地方部署必要的资源。时间线提取和抽象总结是利用大量社交媒体关于事件的信息的重要技术任务。不幸的是,为这些任务制定技术方法基准的数据集很少。本文介绍了迄今为止最大的当地危机事件时间表数据集“危机LLSum”。“危机LSum”包含跨越四个领域的1000个危机事件时间表:野火、当地火灾、交通和风暴。我们用半自动的集群-时空搜索方法从公共推特流收集数据。我们的初步实验显示,与人类在这两项任务上的绩效相比,强基线的绩效存在巨大差距。我们的数据集、代码和模型可以公开获取。