Vietnamese labor market has been under an imbalanced development. The number of university graduates is growing, but so is the unemployment rate. This situation is often caused by the lack of accurate and timely labor market information, which leads to skill miss-matches between worker supply and the actual market demands. To build a data monitoring and analytic platform for the labor market, one of the main challenges is to be able to automatically detect occupational skills from labor-related data, such as resumes and job listings. Traditional approaches rely on existing taxonomy and/or large annotated data to build Named Entity Recognition (NER) models. They are expensive and require huge manual efforts. In this paper, we propose a practical methodology for skill detection in Vietnamese job listings. Rather than viewing the task as a NER task, we consider the task as a ranking problem. We propose a pipeline in which phrases are first extracted and ranked in semantic similarity with the phrases' contexts. Then we employ a final classification to detect skill phrases. We collected three datasets and conducted extensive experiments. The results demonstrated that our methodology achieved better performance than a NER model in scarce datasets.
翻译:越南劳动力市场一直处于不平衡的发展中。大学毕业生人数正在增长,但失业率也在增长。这种情况往往是由于缺乏准确和及时的劳动力市场信息造成的,这导致工人供应和实际市场需求之间出现技能差错。要为劳动力市场建立一个数据监测和分析平台,主要挑战之一是能够从劳动力相关数据中自动发现职业技能,如简历和工作名单。传统方法依靠现有的分类和(或)大量附加说明的数据来建立命名实体识别模型。这些模型费用昂贵,需要大量手工工作。在本文中,我们提出了在越南工作列表中发现技能的实用方法。我们不把这项任务视为净化任务,而是将任务视为一个排名问题。我们建议建立一个管道,首先从词组中抽出,在语义上与词组环境相似。然后我们用最后的分类来检测技能短语。我们收集了三个数据集,并进行了广泛的实验。结果表明,我们的方法比稀缺数据组的模型更好地表现了我们的方法。