Distinguishing fake or untrue news from satire or humor poses a unique challenge due to their overlapping linguistic features and divergent intent. This study develops WISE (Web Information Satire and Fakeness Evaluation) framework which benchmarks eight lightweight transformer models alongside two baseline models on a balanced dataset of 20,000 samples from Fakeddit, annotated as either fake news or satire. Using stratified 5-fold cross-validation, we evaluate models across comprehensive metrics including accuracy, precision, recall, F1-score, ROC-AUC, PR-AUC, MCC, Brier score, and Expected Calibration Error. Our evaluation reveals that MiniLM, a lightweight model, achieves the highest accuracy (87.58%) among all models, while RoBERTa-base achieves the highest ROC-AUC (95.42%) and strong accuracy (87.36%). DistilBERT offers an excellent efficiency-accuracy trade-off with 86.28\% accuracy and 93.90\% ROC-AUC. Statistical tests confirm significant performance differences between models, with paired t-tests and McNemar tests providing rigorous comparisons. Our findings highlight that lightweight models can match or exceed baseline performance, offering actionable insights for deploying misinformation detection systems in real-world, resource-constrained settings.
翻译:区分虚假新闻与讽刺或幽默内容因其重叠的语言特征和相异的意图而构成独特挑战。本研究开发了WISE(网络信息讽刺性与虚假性评估)框架,在包含20,000个样本的平衡数据集上对八个轻量级Transformer模型和两个基线模型进行基准测试,该数据集源自Fakeddit平台并标注为虚假新闻或讽刺内容。通过分层五折交叉验证,我们使用包括准确率、精确率、召回率、F1分数、ROC-AUC、PR-AUC、马修斯相关系数、布里尔分数和预期校准误差在内的综合指标评估模型性能。评估结果表明:轻量级模型MiniLM在所有模型中取得最高准确率(87.58%),而RoBERTa-base获得最高ROC-AUC(95.42%)并保持较强准确率(87.36%)。DistilBERT在效率与准确率间展现出优异平衡,达到86.28%准确率与93.90% ROC-AUC。统计检验通过配对t检验与麦克尼马尔检验证实模型间存在显著性能差异。我们的研究结果强调,轻量级模型能够达到或超越基线性能,为在现实世界资源受限环境中部署虚假信息检测系统提供了可操作的见解。