Misinformation is still a major societal problem and the arrival of Large Language Models (LLMs) only added to it. This paper analyzes synthetic, false, and genuine information in the form of text from spectral analysis, visualization, and explainability perspectives to find the answer to why the problem is still unsolved despite multiple years of research and a plethora of solutions in the literature. Various embedding techniques on multiple datasets are used to represent information for the purpose. The diverse spectral and non-spectral methods used on these embeddings include t-distributed Stochastic Neighbor Embedding (t-SNE), Principal Component Analysis (PCA), and Variational Autoencoders (VAEs). Classification is done using multiple machine learning algorithms. Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Integrated Gradients are used for the explanation of the classification. The analysis and the explanations generated show that misinformation is quite closely intertwined with genuine information and the machine learning algorithms are not as effective in separating the two despite the claims in the literature.
翻译:暂无翻译