Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing system to support this use case well. In this paper, based on the findings of a formative study about how geoscientists collaboratively annotate literature and extract and aggregate data, we proposed DeepShovel, a publicly-available AI-assisted data extraction system to support their needs. DeepShovel leverages the state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc. in a human-AI collaboration manner. A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases, and encouraged teams to form a larger scale but more tightly-coupled collaboration.
翻译:地球科学家以及许多领域的研究人员需要阅读大量文献,以查找、提取和汇总相关结果和数据,从而进行未来研究或建立科学数据库,但目前没有系统来很好地支持这一使用案例。 在本文件中,根据关于地球科学家如何合作编写文献、提取和汇总数据的成型研究的结果,我们建议DepShovel,这是一个可公开获得的由AI协助的数据提取系统,以满足他们的需求。 DeepShovel利用最新的神经网络模型支持研究人员(以PDF格式),以人类-AI合作方式从表格、图表、地图等中提取数据。与14名研究人员进行的后续用户评估表明,DeepShovel提高数据提取用户在建立科学数据库方面的效率,并鼓励各团队形成规模更大但更紧密的合作。