In this work, we collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining Covid-19 vaccination spanning over a period of seven months (September 2020 - March 2021). Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading and validate against a random subset of results manually. We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones. This exploratory analysis enables us to design features (such as sentiments, hashtags, nouns, pronouns, etc) that can, in turn, be exploited for classifying tweets as (Non-)Misleading using various ML models in an explainable manner. Specifically, several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (XAI) tool. While the thrust of this work is principally exploratory analysis in order to obtain insights on the online discourse on Covid-19 vaccination, we conclude the paper by outlining how these insights provide the foundations for a more actionable approach to mitigate misinformation. The curated dataset and code is made available (Github repository) so that the research community at large can reproduce, compare against, or build upon this work.
翻译:在这项工作中,我们收集了一组与七个月期间(2020年9月至2021年3月)Covid-19疫苗有关的具有中等规模代表性的推文(200 000份),涉及Covid-19疫苗(2021年9月至2021年3月),根据转移学习方法,我们利用预先训练的基于变换机的XLNet模型,将推文归类为误导或不误导,并对照随机随机的一组结果进行验证。我们以此为基础,研究和比较体外推文中本质上误导而非误导非误导性的推文的特点。这一探索性分析使我们能够设计一些特征(如情感、标签、名词、代言等),这些特征反过来可以用来将推文归类为(非)以可解释的方式使用各种ML模型将推文归类为误导或不误导。具体地说,若干ML模型被用于预测,达到90%的准确度,而每个特性的重要性则使用SHAP可解释的AI(XAI)工具加以解释。尽管这项工作的主旨是探索性分析,以便了解关于Covid-19的在线讨论方法的见解,我们通过分析可以得出更深刻的论文,从而确定如何在Crodialalbb rodition rodudal rodu rodudroducildrode 。