Relevant and timely information collected from social media during crises can be an invaluable resource for emergency management. However, extracting this information remains a challenging task, particularly when dealing with social media postings in multiple languages. This work proposes a cross-lingual method for retrieving and summarizing crisis-relevant information from social media postings. We describe a uniform way of expressing various information needs through structured queries and a way of creating summaries answering those information needs. The method is based on multilingual transformers embeddings. Queries are written in one of the languages supported by the embeddings, and the extracted sentences can be in any of the other languages supported. Abstractive summaries are created by transformers. The evaluation, done by crowdsourcing evaluators and emergency management experts, and carried out on collections extracted from Twitter during five large-scale disasters spanning ten languages, shows the flexibility of our approach. The generated summaries are regarded as more focused, structured, and coherent than existing state-of-the-art methods, and experts compare them favorably against summaries created by existing, state-of-the-art methods.
翻译:在危机期间从社交媒体收集的相关和及时的信息可以成为应急管理的宝贵资源。然而,提取这些信息仍是一项艰巨的任务,特别是在处理社交媒体以多种语言发布信息时。这项工作提议了一种跨语言的方法,从社交媒体发布中检索和总结与危机有关的信息。我们描述了一种通过结构性查询表达各种信息需求的统一方式,以及一种满足这些信息需求的摘要制作方式。这种方法以多语种变压器嵌入为基础。查询用的是嵌入器所支持的一种语言撰写,抽取的句子可以用其他任何一种语言撰写。摘要由变换器制作。由众包评价员和应急管理专家进行的评价,以及从推特上收集的10种语言的五种大规模灾害中收集的文献,显示了我们的方法的灵活性。所产生的摘要被认为比现有的最新方法更加集中、结构化和一致,专家们将其与现有最新方法制作的摘要进行比较。