越南勘探类别探测的单语和多语种 BERT模型 (Investigating Monolingual and Multilingual BERTModels for Vietnamese Aspect Category Detection)

Aspect category detection (ACD) is one of the challenging tasks in the Aspect-based sentiment Analysis problem. The purpose of this task is to identify the aspect categories mentioned in user-generated reviews from a set of pre-defined categories. In this paper, we investigate the performance of various monolingual pre-trained language models compared with multilingual models on the Vietnamese aspect category detection problem. We conduct the experiments on two benchmark datasets for the restaurant and hotel domain. The experimental results demonstrated the effectiveness of the monolingual PhoBERT model than others on two datasets. We also evaluate the performance of the multilingual model based on the combination of whole SemEval-2016 datasets in other languages with the Vietnamese dataset. To the best of our knowledge, our research study is the first attempt at performing various available pre-trained language models on aspect category detection task and utilize the datasets from other languages based on multilingual models.

翻译：外观类别探测(ACD)是基于外观的情绪分析问题中具有挑战性的任务之一。此项任务的目的是从一组预先定义的类别中确定用户生成的审查中提到的方面类别。在本文件中,我们调查了与越南方面检测问题多语种模型相比,各种单一语言的预先培训语言模型与越南方面检测问题多语种模型的性能。我们试验了餐饮和酒店域的两个基准数据集。实验结果显示单语PhoBERT模式在两个数据集上比其他模式有效。我们还评估了基于其他语言的全SemEval 2016数据集与越南数据集相结合的多语模式的性能。我们最了解的是,我们的研究是首次尝试就侧面类别探测任务实施各种现有的预先培训语言模型,并利用基于多语种模型的其他语言的数据集。