Open pit mines left many regions worldwide inhospitable or uninhabitable. To put these regions back into use, entire stretches of land must be renaturalized. For the sustainable subsequent use or transfer to a new primary use, many contaminated sites and soil information have to be permanently managed. In most cases, this information is available in the form of expert reports in unstructured data collections or file folders, which in the best case are digitized. Due to size and complexity of the data, it is difficult for a single person to have an overview of this data in order to be able to make reliable statements. This is one of the most important obstacles to the rapid transfer of these areas to after-use. An information-based approach to this issue supports fulfilling several Sustainable Development Goals regarding environment issues, health and climate action. We use a stack of Optical Character Recognition, Text Classification, Active Learning and Geographic Information System Visualization to effectively mine and visualize this information. Subsequently, we link the extracted information to geographic coordinates and visualize them using a Geographic Information System. Active Learning plays a vital role because our dataset provides no training data. In total, we process nine categories and actively learn their representation in our dataset. We evaluate the OCR, Active Learning and Text Classification separately to report the performance of the system. Active Learning and text classification results are twofold: Whereas our categories about restrictions work sufficient ($>$.85 F1), the seven topic-oriented categories were complicated for human coders and hence the results achieved mediocre evaluation scores ($<$.70 F1).
翻译:开放的矿坑使世界各地许多区域不适宜或不适于居住。要使这些地区重新使用,必须重新对整片土地进行再加工。为了在随后可持续地使用或转让到新的初级用途,必须长期管理许多受污染的场地和土壤信息。在大多数情况下,这种信息以专家报告的形式存在于非结构化的数据收集或文件文件夹中,在最理想的情况下,这种信息是数字化的。由于数据的规模和复杂性,一个人很难对这些数据有一个概览,以便能够作出可靠的说明。这是快速将这些地区转移到使用后使用的最重要障碍之一。对于这一问题,基于信息的方法支持在环境问题、健康和气候行动方面实现若干可持续发展目标。我们用一整套光学字符识别、文本分类、积极学习和地理信息系统的视觉化来有效地进行采矿和这种信息的视觉化。随后,我们将所提取的信息与地理坐标联系起来,并且利用地理信息系统对它们进行视觉化。积极的学习发挥着关键作用,因为我们的数据集没有提供复杂的培训数据。 在总体而言,我们以信息为基础的数据分类分类系统有一定的分类,因此,我们一直在进行中学习的文本格式 。