Vaccine hesitancy and other COVID-19-related concerns and complaints in the Philippines are evident on social media. It is important to identify these different topics and sentiments in order to gauge public opinion, use the insights to develop policies, and make necessary adjustments or actions to improve public image and reputation of the administering agency and the COVID-19 vaccines themselves. This paper proposes a semi-supervised machine learning pipeline to perform topic modeling, sentiment analysis, and an analysis of vaccine brand reputation to obtain an in-depth understanding of national public opinion of Filipinos on Facebook. The methodology makes use of a multilingual version of Bidirectional Encoder Representations from Transformers or BERT for topic modeling, hierarchical clustering, five different classifiers for sentiment analysis, and cosine similarity of BERT topic embeddings for vaccine brand reputation analysis. Results suggest that any type of COVID-19 misinformation is an emergent property of COVID-19 public opinion, and that the detection of COVID-19 misinformation can be an unsupervised task. Sentiment analysis aided by hierarchical clustering reveal that 21 of the 25 topics extrapolated by topic modeling are negative topics. Such negative comments spike in count whenever the Department of Health in the Philippines posts about the COVID-19 situation in other countries. Additionally, the high numbers of laugh reactions on the Facebook posts by the same agency -- without any humorous content -- suggest that the reactors of these posts tend to react the way they do, not because of what the posts are about but because of who posted them.
翻译:菲律宾社会媒体上明显存在疫苗失密和与COVID-19有关的其他关切和投诉,必须查明这些不同的专题和情绪,以衡量公众舆论,利用见解制定政策,并作出必要调整或行动,改善管理机构和COVID-19疫苗本身的公共形象和声誉。本文件建议建立一个半监督的机器学习管道,以进行主题建模、情绪分析,并分析疫苗品牌名声,以深入了解脸书上菲律宾人的国家舆论。该方法利用变换者或BERT的双向电码代表处的多语言版本,用于主题建模、分级组合、五个不同的分类以进行情绪分析,以及联合BERT专题的相似性,以纳入疫苗品牌名声分析。结果显示,任何类型的COVID-19错误信息都是COVID-19公众意见的新兴特性,发现CVID-19错误信息可能是一项不可靠的任务。通过分级组合分析显示,在菲律宾的25个专题文章中,这些内容不是以数字推断出的,因为健康部的高比率显示,这些高比率显示,而健康部的高比率则表明,这些高比率的25个专题是负面反应。