Sentiment analysis (SA) has become an extensive research area in recent years impacting diverse fields including ecommerce, consumer business, and politics, driven by increasing adoption and usage of social media platforms. It is challenging to extract topics and sentiments from unsupervised short texts emerging in such contexts, as they may contain figurative words, strident data, and co-existence of many possible meanings for a single word or phrase, all contributing to obtaining incorrect topics. Most prior research is based on a specific theme/rhetoric/focused-content on a clean dataset. In the work reported here, the effectiveness of BERT(Bidirectional Encoder Representations from Transformers) in sentiment classification tasks from a raw live dataset taken from a popular microblogging platform is demonstrated. A novel T-BERT framework is proposed to show the enhanced performance obtainable by combining latent topics with contextual BERT embeddings. Numerical experiments were conducted on an ensemble with about 42000 datasets using NimbleBox.ai platform with a hardware configuration consisting of Nvidia Tesla K80(CUDA), 4 core CPU, 15GB RAM running on an isolated Google Cloud Platform instance. The empirical results show that the model improves in performance while adding topics to BERT and an accuracy rate of 90.81% on sentiment classification using BERT with the proposed approach.
翻译:情感分析(SA)近年来已成为一个广泛的研究领域,对电子商务、消费者商业和政治等不同领域产生了广泛影响,其驱动力是越来越多地采用和使用社交媒体平台;从这种背景下出现的未经监督的短文本中提取专题和情绪具有挑战性,因为这些短文本可能包含比喻词、尖锐数据,并同时存在一个单词或词句的许多可能的含义,所有这些都有助于获得不正确的专题;先前的研究大多基于一个特定的主题/热点/重点内容,即清洁数据集。在此报告的工作中,BERT(来自变换器的双向编码表示)在感知分类任务中的效力来自一个流行的微博平台所拍摄的原始活数据集,因为这些短文本中可能含有比喻词、尖锐数据数据,并且有助于获得不正确的单一词或单词。在使用NimblebleBox.ai平台的约2000个数据集中进行了数字实验,该平台由Nvidia Tesla K80(CUDA) 和直观图像配置构成的Nvidia-vidia K80(CAR81) 显示BCURalimal 15 显示CU 的CLAMS 的CLADR 和CA 15 测试率。