False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles. Mappings consist of claim presence, i.e., whether a claim is contained in a given article, and article stance towards the claim. We provide several baselines for these two tasks and evaluate them on the manually labelled part of the dataset. The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.
翻译:虚假信息对个人和整个社会都有重大负面影响,特别是在目前的COVID-19时代,我们看到医学错误信息空前增长,为了帮助通过机器学习方法解决这一问题,我们正在出版一个约317k医学新闻文章/博客和3.5k经事实核对的索赔要求的特长数据集,其中还包含573个人工手动和超过51k自动标注的索赔要求和文章之间的图象。绘图包括索赔存在,即索赔是否包含在某一篇文章中,以及索赔要求的立场。我们为这两项任务提供了几个基线,并在数据集中人工标注的部分对其进行评估。数据集使得与医疗错误信息有关的一些额外任务得以进行,例如错误信息特征研究或对来源之间错误信息传播的研究。