以社会媒体数据为基础进行机器学习用于在检测社会社交中进行情感分析的机器学习比较 (Comparison of Machine Learning for Sentiment Analysis in Detecting Anxiety Based on Social Media Data)

All groups of people felt the impact of the COVID-19 pandemic. This situation triggers anxiety, which is bad for everyone. The government's role is very influential in solving these problems with its work program. It also has many pros and cons that cause public anxiety. For that, it is necessary to detect anxiety to improve government programs that can increase public expectations. This study applies machine learning to detecting anxiety based on social media comments regarding government programs to deal with this pandemic. This concept will adopt a sentiment analysis in detecting anxiety based on positive and negative comments from netizens. The machine learning methods implemented include K-NN, Bernoulli, Decision Tree Classifier, Support Vector Classifier, Random Forest, and XG-boost. The data sample used is the result of crawling YouTube comments. The data used amounted to 4862 comments consisting of negative and positive data with 3211 and 1651. Negative data identify anxiety, while positive data identifies hope (not anxious). Machine learning is processed based on feature extraction of count-vectorization and TF-IDF. The results showed that the sentiment data amounted to 3889 and 973 in testing, and training with the greatest accuracy was the random forest with feature extraction of vectorization count and TF-IDF of 84.99% and 82.63%, respectively. The best precision test is K-NN, while the best recall is XG-Boost. Thus, Random Forest is the best accurate to detect someone's anxiety based-on data from social media.

翻译：所有人群都感受到了COVID-19流行病的影响。这种状况会引发焦虑, 而这对每个人来说都是坏事。政府的作用在解决其工作方案上都具有很大的影响。政府的作用在解决这些问题方面有着很大的影响。政府也有许多利害关系,引起公众的焦虑。为此,必须发现焦虑,以改善政府方案,从而提高公众的期望。本项研究运用机器学习,根据社会媒体对政府应对这一流行病的方案的评论来发现焦虑。这个概念将根据网友的正面和负面评论进行感知分析。所实施的机器学习方法包括K-NNN、Bernoulli、决定树分类、支持矢量分类、随机森林和XG-加速。所使用的数据样本是爬动的YouTube评论的结果。所使用的数据为4862种评论,包括负和正面的数据,3211和1651。负面数据识别焦虑,而积极的数据则确定了希望(不焦虑)。机器学习是根据计数和TF-IDF的特征提取处理的。结果显示,测试中的情绪数据达3889和97, 测试时的情绪数据为最佳精确度, 而最佳的森林的精确度是随机的森林特性测试。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/