利用学院子数据分析COVID-19大流行病的感知分析和影响 (Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data)

The COVID-19 pandemic has affected societies and human health and well-being in various ways. In this study, we collected Reddit data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with 8 universities, applied natural language processing (NLP) techniques, and trained graphical neural networks with social media data, to study how the pandemic has affected people's emotions and psychological states compared to the pre-pandemic era. Specifically, we first applied a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn embedding from the semantic information of Reddit messages and trained a graph attention network (GAT) for sentiment classification. The usage of GAT allows us to leverage the relational information among the messages during training. We then applied subgroup-adaptive model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the manually labeled and model-predicted sentiment labels on the collected data, we applied a generalized linear mixed-effects model to estimate the effects of pandemic and online teaching on people's sentiment in a statistically significant manner. The results suggest the odds of negative sentiments in 2020 is $14.6\%$ higher than the odds in 2019 ($p$-value $<0.001$), and the odds of negative sentiments are $41.6\%$ higher with in-person teaching than with online teaching in 2020 ($p$-value $=0.037$) in the studied population.

翻译：在这项研究中,我们从与8所大学有关的分编辑社区收集了2019年(广度前)和2020年(广度后)的Reddidi数据,采用了自然语言处理(NLP)技术,并用社交媒体数据收集了经过培训的图形神经网络,以研究该流行病与人口大规模前时代相比如何影响人们的情感和心理状态。具体地说,我们首先采用了预先培训的Robustly优化BERT预培训方法(ROBERTA)学习Redit讯息的语义价值信息,并培训了用于情绪分类的图形关注网络(GAT)。GAT的使用使我们能够在培训期间利用信息之间的关联信息。我们随后采用了分组适应模型,将RoBERTA和GAT的预测概率结合起来,对情绪进行最后分类。在所收集的数据中,用人工标签和模型预测的BERTERT(ROBTA),我们用更高的线性价值信息关注网络上值($0.00美元),在2020年的数学模型中,用高水平的教学混合效应模型估算了20美元。