This study explores the generation and evaluation of synthetic fake news through fact based manipulations using large language models (LLMs). We introduce a novel methodology that extracts key facts from real articles, modifies them, and regenerates content to simulate fake news while maintaining coherence. To assess the quality of the generated content, we propose a set of evaluation metrics coherence, dissimilarity, and correctness. The research also investigates the application of synthetic data in fake news classification, comparing traditional machine learning models with transformer based models such as BERT. Our experiments demonstrate that transformer models, especially BERT, effectively leverage synthetic data for fake news detection, showing improvements with smaller proportions of synthetic data. Additionally, we find that fact verification features, which focus on identifying factual inconsistencies, provide the most promising results in distinguishing synthetic fake news. The study highlights the potential of synthetic data to enhance fake news detection systems, offering valuable insights for future research and suggesting that targeted improvements in synthetic data generation can further strengthen detection models.
翻译:暂无翻译