With the growth of social medias, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter -- the tweets -- have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted word representations from distinct natures to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modelling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also fine-tuned based on the masked language model task, using a plethora of strategies.
翻译:随着Twitter等社交媒体的增长,每天都有大量用户生成的数据。在Twitter上公布的短文 -- -- 推特 -- -- 引起人们的极大关注,因为这是一个丰富的信息来源,可以指导许多决策进程。然而,其内在特征,如非正式和吵闹的语言风格,仍然对许多自然语言处理(NLP)任务具有挑战性,包括情绪分析。感化分类主要由基于学习的机器分类人员处理。文献采用了不同性质的文字表达方式,将推文转换成基于矢量的口介质输入,为情绪分类者提供感知。这些表述方式来自简单的计数方法,如字袋,以及基于潮流动的BERTweet等更精密的信息来源。然而,大多数研究主要侧重于仅使用少量数据集来评价这些模型。尽管近年来在语言建模方面取得了进展,但在对基于推文分析的诱导嵌入模型进行强有力的评价方面仍然存在差距。此外,对来自下游任务的模式作了微调,从基于精细的计算方法,对基于精细语言结构结构的BERTERTweet的调整,也从使用基于具体语言格式的变式数据格式进行的调整。