Disinformation through fake news is an ongoing problem in our society and has become easily spread through social media. The most cost and time effective way to filter these large amounts of data is to use a combination of human and technical interventions to identify it. From a technical perspective, Natural Language Processing (NLP) is widely used in detecting fake news. Social media companies use NLP techniques to identify the fake news and warn their users, but fake news may still slip through undetected. It is especially a problem in more localised contexts (outside the United States of America). How do we adjust fake news detection systems to work better for local contexts such as in South Africa. In this work we investigate fake news detection on South African websites. We curate a dataset of South African fake news and then train detection models. We contrast this with using widely available fake news datasets (from mostly USA website). We also explore making the datasets more diverse by combining them and observe the differences in behaviour in writing between nations' fake news using interpretable machine learning.
翻译:通过假新闻获取的虚假信息在我国社会是一个持续的问题,并且已经很容易通过社交媒体传播。过滤这些大量数据的最具成本和时间效益的方法是使用人和技术干预的组合来识别这些数据。从技术角度讲,自然语言处理(NLP)被广泛用于检测假新闻。社交媒体公司使用NLP技术来识别假新闻并警告用户,但假新闻仍然会通过未察觉的方式溜走。这在更地方化的环境中(美国以外)尤其是一个问题。我们如何调整假新闻探测系统以更好地为南非等地方环境服务。在这项工作中,我们调查南非网站上的假新闻探测。我们整理南非假新闻数据集,然后训练探测模型。我们用广泛存在的假新闻数据集(大部分来自美国网站)来对比这一点。我们还探索通过将它们合并来使数据集更加多样化,并用可解释的机器学习来观察各国假新闻在写时的行为差异。