Covid-19 has spread across the world and several vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pre-trained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art pre-trained transformer models RoBERTa, XLNet and BERT, and the domain-specific transformer models CT-BERT and BERTweet that are pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using Language Model based Oversampling Technique (LMOTE) to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of domain-specific transformer models for the classification task.
翻译:Covid-19已经遍及世界各地,并开发了数种疫苗来对付其激增。为了确定社交媒体站点提供的疫苗的正确情绪,我们微调了与Covid-19疫苗相关的各种最先进的预培训变压器的推文模型。具体地说,我们使用最近推出的最先进的预培训变压器模型RoBERTA、XLNet和BERT, 以及特定域变压器模型CT-BERT和BERTweet, 这些变压器在Covid-19 Twitter上经过预先培训。我们进一步探索了通过使用基于过度抽样技术的语言模型(LMOTE)过度取样增加文本的选项,以改进这些模型的精度,特别是用于在正、负和中性情绪类别中性分类中存在不平衡等级分布的小型抽样数据集。我们的结果总结了我们关于对用于微调受控变压的变压器模型的不平衡的小型抽样数据集进行超标的文本是否合适,以及具体域变压器模型用于任务分类的实用性。