Despite increasing awareness and research around fake news, there is still a significant need for datasets that specifically target racial slurs and biases within North American political speeches. This is particulary important in the context of upcoming North American elections. This study introduces a comprehensive dataset that illuminates these critical aspects of misinformation. To develop this fake news dataset, we scraped and built a corpus of 40,000 news articles about political discourses in North America. A portion of this dataset (4000) was then carefully annotated, using a blend of advanced language models and human verification methods. We have made both these datasets openly available to the research community and have conducted benchmarking on the annotated data to demonstrate its utility. We release the best-performing language model along with data. We encourage researchers and developers to make use of this dataset and contribute to this ongoing initiative.
翻译:暂无翻译