Disturbances in the job market such as advances in science and technology, crisis and increased competition have triggered a surge in reskilling and upskilling programs. Information on suitable continuing education options is distributed across many sites, rendering the search, comparison and selection of useful programs a cumbersome task. This paper, therefore, introduces a knowledge extraction system that integrates reskilling and upskilling options into a single knowledge graph. The system collects educational programs from 488 different providers and uses context extraction for identifying and contextualizing relevant content. Afterwards, entity recognition and entity linking methods draw upon a domain ontology to locate relevant entities such as skills, occupations and topics. Finally, slot filling integrates entities based on their context into the corresponding slots of the continuous education knowledge graph. We also introduce a German gold standard that comprises 169 documents and over 3800 annotations for benchmarking the necessary content extraction, entity linking, entity recognition and slot filling tasks, and provide an overview of the system's performance.
翻译:就业市场的动荡,如科技进步、危机和竞争加剧,引发了再技能和提高技能方案的激增。关于合适的继续教育选择的信息分布在许多地点,使得搜索、比较和选择有用的方案成为一项繁琐的任务。因此,本文件引入了一个知识提取系统,将再技能和提高技能选择纳入一个单一的知识图中。该系统收集了488个不同提供者的教育方案,并使用背景提取方法来确定相关内容和背景。随后,实体识别和实体连接方法利用一个域信息学来定位相关实体,如技能、职业和专题。最后,空缺填补将各实体根据其背景纳入持续教育知识图的相应时段。我们还引入了包含169份文件和3800多份说明的德国黄金标准,以作为必要的内容提取、实体连接、实体识别和职位填充任务的基准,并概述了该系统的业绩。