Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.
翻译:食物传播疾病是一个严重但可预防的公共健康问题 -- -- 相关疾病爆发的检测工作出现延误,导致生产力损失、昂贵的回忆、公共安全危险甚至生命损失。社交媒体是查明未报告的食品传播疾病的一个很有希望的来源,但缺乏标签的数据集,以开发有效的疾病爆发检测模型。为了加速开发基于机学习的食品传播疾病检测模型,我们提出TWEET-FID(TWEET-Foodbound Indernession),这是用于多重食物传播疾病检测任务的第一个附加注释的数据集。从推特收集的TWEET-FID具有三个方面:Twitter类、实体类型和空档类型,配有专家和众源工人制作的标签。我们提出了利用这三个方面的几项领域任务:文本相关性分类(TRC)、实体提及检测(EMD)和空缺填充(SFSF),我们介绍了数据集设计、创建和支持这些任务模型开发的端对端至端方法的标签。从TWEET-FID收集这些任务的综合结果集出利用最新、单一和多塔克的深度研究机会,用于未来疾病爆发研究。