HumAID:Twitter上的人类附加说明的灾害事件数据 (HumAID: Human-Annotated Disaster Incidents Data from Twitter)

Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annotation. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available.\url{https://crisisnlp.qcri.org/humaid_dataset.html}

翻译：社会网络被广泛用于信息消费和传播,特别是在自然灾害等时间紧迫事件期间。尽管数量巨大,社交媒体内容往往过于吵闹,无法直接用于任何应用。因此,必须过滤、分类和简明扼要地概述现有内容,以促进有效的消费和决策。为了解决这些问题,已利用监督的模型方法开发了自动分类系统,此前曾努力创建标签数据集,因此,创建有标签的数据集。然而,现有数据集在不同方面(如规模、含有复制品)有限,更不适合支持更先进和数据饥饿的深层学习模式。在本文件中,我们展示了一个新的大型数据集,使用~77K 人类标签的推特,从2016至2019年发生的19起灾害事件中收集了大约2 400万个推特。此外,我们提议建立一个数据收集和取样管道,这对于社会媒体数据取样对人类认知十分重要。我们报告以经典和深层次学习(快速图和变异模型)为基础的多级分类结果,以建立未来研究的地面模型。数据集和相关资源可公开获得。httpsetetset and commexqurqal data_qal data@qdaldaldalmagistrat}

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Twitter】时序图神经网络

专知会员服务

95+阅读 · 2020年10月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日