COVID-19 印度数据集:印度各邦每日卫生公报中详细详细COVID-19数据 (COVID-19 India Dataset: Parsing Detailed COVID-19 Data in Daily Health Bulletins from States in India)

While India remains one of the hotspots of the COVID-19 pandemic, data about the pandemic from the country has proved to be largely inaccessible for use at scale. Much of the data exists in an unstructured form on the web, and limited aspects of such data are available through public APIs maintained manually through volunteer efforts. This has proved to be difficult both in terms of ease of access to detailed data as well as with regards to the maintenance of manual data-keeping over time. This paper reports on a recently launched project aimed at automating the extraction of such data from public health bulletins with the help of a combination of classical PDF parsers as well as state-of-the-art ML-based documents extraction APIs. In this paper, we will describe the automated data-extraction technique, the nature of the generated data, and exciting avenues of ongoing work.

翻译：虽然印度仍然是COVID-19大流行病的热点之一,但印度提供的有关该大流行病的数据基本上无法大规模使用,许多数据在网络上以非结构化的形式存在,这些数据的有限部分是通过志愿工作人工维持的公共API提供的,这在方便获取详细数据方面以及在长期保持人工保存数据方面都证明很困难。本文报告了最近发起的一个项目,其目的是在传统的PDF分析器和以ML为基础的最先进的文件提取API的综合帮助下,从公共卫生公报中自动提取此类数据。本文将介绍自动化数据传送技术、生成数据的性质以及令人振奋的工作途径。本文将介绍自动化的数据传送技术、生成数据的性质。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《人工智能数据安全白皮书》（2019版）发布，51页PDF，中国信息通信研究院编

专知会员服务

149+阅读 · 2019年11月8日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日