Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.
翻译:数百万人使用YouTube、Facebook、Twitter和其他大众媒体等平台。由于这些平台的无障碍性,这些平台常常被用来建立叙述、进行宣传和传播错误信息。这项工作提议采用最先进的NLP技术从视频字幕(字幕)中提取特征的方法。为了评估我们的方法,我们使用一个公众可访问和贴标签的数据集,将视频分类为错误信息或不。探索视频字幕背后的动机来自我们对视频元数据的分析。诸如观点、类似观点、不喜欢和评论的数量等属性无效,因为视频很难使用这些信息。使用标题数据集,拟议模型可以将视频分为三个班(Misation、Debunking Misfrication和中性),0.85至0.90 F1核心。为了强调错误信息的关联性,我们重新编排了我们的分类问题,将分类为两类分类:错误信息相对于其他(错误信息与中性)。在我们实验中,拟议模型可以将视频分为0.92至0.95 F1核心和0.78至0.90 ORC。