The rise of manipulating fake news as a political weapon has become a global concern and highlighted the incapability of manually fact checking against rapidly produced fake news. Thus, statistical approaches are required if we are to address this problem efficiently. The shortage of publicly available datasets is one major bottleneck of automated fact checking. To remedy this, we collected 24K manually rated statements from PolitiFact. The class values exhibit a natural order with respect to truthfulness as shown in Table 1. Thus, our task represents a twist from standard classification, due to the various degrees of similarity between classes. To investigate this, we defined coarse-to-fine classification regimes, which presents new challenge for classification. To address this, we propose BERT-based models. After training, class similarity is sensible over the multi-class datasets, especially in the fine-grained one. Under all the regimes, BERT achieves state of the art, while the additional layers provide insignificant improvement.
翻译:将假新闻作为政治武器操纵的兴起已成为全球关注的问题,并突显了手工对快速制作的假新闻进行事实检查的能力。 因此,如果我们要有效解决这一问题,就必须采用统计方法。 缺少公开的数据集是自动进行事实检查的一个主要瓶颈。 为了解决这个问题,我们从PolitaFact收集了24K人工评级的报表。 如表1所示,类值在真实性方面表现出自然秩序。 因此,我们的任务来自标准分类的扭曲,因为不同等级之间的不同程度的相似性。 为了调查这一点,我们定义了对分类提出新挑战的粗到细的分类制度。 为了解决这个问题,我们提出了基于BERT的模型。在培训之后,对于多级数据集,特别是精细的数据集,等级相似性是明智的。在所有制度下,BERT都达到了艺术的状态,而额外的层次则提供了微不足道的改进。