Nowadays, metadata information is often given by the authors themselves upon submission. However, a significant part of already existing research papers have missing or incomplete metadata information. German scientific papers come in a large variety of layouts which makes the extraction of metadata a non-trivial task that requires a precise way to classify the metadata extracted from the documents. In this paper, we propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. This model aims to increase the overall accuracy of metadata extraction compared to other state-of-the-art approaches. It enables the utilization of both spatial and contextual features in order to achieve a more reliable extraction. Our model for this approach was trained on a dataset consisting of around 8800 documents and is able to obtain an overall F1-score of 0.923.
翻译:目前,元数据信息往往由作者自己在提交时提供,然而,现有的研究论文中有很大一部分缺少或不完整元数据信息。德国科学论文有各种各样的布局,使得提取元数据成为非三重任务,需要精确地对从文件中提取的元数据进行分类。在本文中,我们建议对从德文科学论文中提取元数据采取多式深层次的多式学习方法。我们通过将自然语言处理和图像视觉处理结合起来来考虑多种类型的输入数据。这个模型的目的是提高元数据提取的总体准确性,而与其他最新方法相比。它能够利用空间和背景特征,以便实现更可靠的提取。我们这一方法的模式在由大约8800份文件组成的数据集上得到了培训,并且能够获得0.923个整体F1芯。