Skill Classification (SC) is the task of classifying job competences from job postings. This work is the first in SC applied to Danish job vacancy data. We release the first Danish job posting dataset: Kompetencer (en: competences), annotated for nested spans of competences. To improve upon coarse-grained annotations, we make use of The European Skills, Competences, Qualifications and Occupations (ESCO; le Vrang et al., 2014) taxonomy API to obtain fine-grained labels via distant supervision. We study two setups: The zero-shot and few-shot classification setting. We fine-tune English-based models and RemBERT (Chung et al., 2020) and compare them to in-language Danish models. Our results show RemBERT significantly outperforms all other models in both the zero-shot and the few-shot setting.
翻译:技能分类(SC)是将工作能力从职位分配中分类的任务。这项工作在SC中首次适用于丹麦职位空缺数据。我们发布了丹麦第一个职位公布数据集:考贝特(en: compencer) (en: compencer) (en: control), 用于嵌套能力范围。为了改进粗略的注释,我们利用欧洲技能、能力、资格和职业(ESCO; le Vrang et al., 2014),通过远程监管获得精细的分类 API 标签。我们研究了两个设置:零点和几点分类设置。我们微调了基于英语的模型和RembERT (Chung等人,2020年),并将其与丹麦语言模型进行比较。我们的结果显示,RemBERT在零点和几点设置中大大优于所有其他模型。