Archival research is a complicated task that involves several diverse activities for the extraction of evidence and knowledge from a set of archival documents. The involved activities are usually unconnected, in terms of data connection and flow, making difficult their recursive revision and execution, as well as the inspection of provenance information at data element level. This paper proposes a workflow model for holistic data management in archival research; from transcribing and documenting a set of archival documents, to curating the transcribed data, integrating it to a rich semantic network (knowledge graph), and then exploring the integrated data quantitatively. The workflow is provenance-aware, highly-recursive and focuses on semantic interoperability, aiming at the production of sustainable data of high value and long-term validity. We provide implementation details for each step of the workflow and present its application in maritime history research. We also discuss relevant quality aspects and lessons learned from its application in a real context.
翻译:档案研究是一项复杂的任务,涉及从一套档案文件中提取证据和知识的多种活动,在数据连接和流动方面,所涉活动通常没有连接,难以进行循环修订和执行,也难以在数据要素一级检查出处信息。本文件提出了档案研究中综合数据管理工作流程模式;从记录和记录一套档案文件,到整理转录数据,将其纳入丰富的语义网络(知识图),然后从数量上探索综合数据。工作流程具有出处意识,高度准确,侧重于语义互操作性,目的是产生价值高和长期有效的可持续数据。我们为工作流程的每一步骤提供实施细节,并在海洋历史研究中介绍其应用情况。我们还讨论从实际应用中吸取的相关质量方面和经验教训。