TWET-FID: 多重食物携带疾病检测任务附加说明数据集 (TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks)

Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.

翻译：食物传播疾病是一个严重但可预防的公共健康问题 -- -- 相关疾病爆发的检测工作出现延误,导致生产力损失、昂贵的回忆、公共安全危险甚至生命损失。社交媒体是查明未报告的食品传播疾病的一个很有希望的来源,但缺乏标签的数据集,以开发有效的疾病爆发检测模型。为了加速开发基于机学习的食品传播疾病检测模型,我们提出TWEET-FID(TWEET-Foodbound Indernession),这是用于多重食物传播疾病检测任务的第一个附加注释的数据集。从推特收集的TWEET-FID具有三个方面:Twitter类、实体类型和空档类型,配有专家和众源工人制作的标签。我们提出了利用这三个方面的几项领域任务:文本相关性分类(TRC)、实体提及检测(EMD)和空缺填充(SFSF),我们介绍了数据集设计、创建和支持这些任务模型开发的端对端至端方法的标签。从TWEET-FID收集这些任务的综合结果集出利用最新、单一和多塔克的深度研究机会,用于未来疾病爆发研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日