The digital transformation of the scientific publishing industry has led to dramatic improvements in content discoverability and information analytics. Unfortunately, these improvements have not been uniform across research areas. The scientific literature in the arts, humanities and social sciences (AHSS) still lags behind, in part due to the scale of analog backlogs, the persisting importance of national languages, and a publisher ecosystem made of many, small or medium enterprises. We propose a bottom-up approach to support publishers in creating and maintaining their own publication knowledge graphs in the open domain. We do so by releasing a pipeline able to extract structured information from the bibliographies and indexes of AHSS publications, disambiguate, normalize and export it as linked data. We test the proposed pipeline on Brill's Classics collection, and release an implementation in open source for further use and improvement.
翻译:科学出版业的数字转变导致内容发现和信息分析的显著改善,不幸的是,这些改进在各研究领域并不一致,艺术、人文和社会科学(AHSS)的科学文献仍然落后,部分原因是模拟积压的规模、民族语言的持续重要性以及由许多中小企业组成的出版者生态系统。我们建议采取自下而上的办法,支持出版商在开放领域创建和维护自己的出版知识图。我们这样做的方式是释放一条管道,能够从AHSS出版物的书目和索引中提取结构化信息,淡化、正常化和出口,作为相关数据。我们测试了布里尔经典集成的编审管道,并发布公开版本,供进一步使用和改进。