Early detection of disease outbreaks is crucial to ensure timely intervention by the health authorities. Due to the challenges associated with traditional indicator-based surveillance, monitoring informal sources such as online media has become increasingly popular. However, owing to the number of online articles getting published everyday, manual screening of the articles is impractical. To address this, we propose Health Sentinel. It is a multi-stage information extraction pipeline that uses a combination of ML and non-ML methods to extract events-structured information concerning disease outbreaks or other unusual health events-from online articles. The extracted events are made available to the Media Scanning and Verification Cell (MSVC) at the National Centre for Disease Control (NCDC), Delhi for analysis, interpretation and further dissemination to local agencies for timely intervention. From April 2022 till date, Health Sentinel has processed over 300 million news articles and identified over 95,000 unique health events across India of which over 3,500 events were shortlisted by the public health experts at NCDC as potential outbreaks.
翻译:疾病暴发的早期检测对于确保卫生当局及时干预至关重要。由于传统基于指标的监测方法面临诸多挑战,监测在线媒体等非正式来源已日益普及。然而,由于每日发布的在线文章数量庞大,人工筛选文章并不现实。为此,我们提出了健康哨兵系统。这是一个多阶段信息抽取流程,结合机器学习与非机器学习方法,从在线文章中提取有关疾病暴发或其他异常健康事件的结构化事件信息。抽取的事件被提供给位于德里的国家疾病控制中心(NCDC)的媒体扫描与核实小组(MSVC),用于分析、解读并进一步分发给地方机构以实施及时干预。自2022年4月至今,健康哨兵已处理超过3亿篇新闻文章,在全印度识别出超过95,000个独特的健康事件,其中超过3,500个事件被NCDC的公共卫生专家筛选为潜在暴发事件。