The proliferation of fake news, i.e., news intentionally spread for misinformation, poses a threat to individuals and society. Despite various fact-checking websites such as PolitiFact, robust detection techniques are required to deal with the increase in fake news. Several deep learning models show promising results for fake news classification, however, their black-box nature makes it difficult to explain their classification decisions and quality-assure the models. We here address this problem by proposing a novel interpretable fake news detection framework based on the recently introduced Tsetlin Machine (TM). In brief, we utilize the conjunctive clauses of the TM to capture lexical and semantic properties of both true and fake news text. Further, we use the clause ensembles to calculate the credibility of fake news. For evaluation, we conduct experiments on two publicly available datasets, PolitiFact and GossipCop, and demonstrate that the TM framework significantly outperforms previously published baselines by at least $5\%$ in terms of accuracy, with the added benefit of an interpretable logic-based representation. Further, our approach provides higher F1-score than BERT and XLNet, however, we obtain slightly lower accuracy. We finally present a case study on our model's explainability, demonstrating how it decomposes into meaningful words and their negations.
翻译:假新闻的扩散,即蓄意传播错误消息,对个人和社会构成威胁。尽管有各种事实核查网站,例如政治行动,但需要强有力的检测技术来应对假新闻的增加。一些深层次的学习模式显示假新闻分类的有希望的结果,然而,它们的黑箱性质使得很难解释其分类决定和质量保证模型。我们在这里处理这个问题,根据最近推出的Tsetlin机器(TM)提出一个新的可解释的假新闻探测框架。简而言之,我们利用TM的搭配条款来捕捉真实和假新闻文本的词汇和语义特性。此外,我们使用该条款来计算假新闻的可信度。为了评估,我们在两个公开的数据集(PolitiFact和GossipCop)上进行实验,并表明TM框架在准确性方面大大超出以前公布的基线,至少5美元。我们的方法比BERT和XLNet的准确性要低一些。我们最后在BERT和XLNet上展示了他们的准确性。