Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query. These models are believed to be reasoning in some way; however, it has been shown that these same results, or better, can be achieved without considering the claim at all -- only the evidence. This implies that other signals are contained within the examined evidence, and could be based on manipulable factors such as emotion, sentiment, or part-of-speech (POS) frequencies, which are vulnerable to adversarial inputs. We neutralize some of these signals through multiple forms of both neural and non-neural pre-processing and style transfer, and find that this flattening of extraneous indicators can induce the models to actually require both claims and evidence to perform well. We conclude with the construction of a model using emotion vectors built off a lexicon and passed through an "emotional attention" mechanism to appropriately weight certain emotions. We provide quantifiable results that prove our hypothesis that manipulable features are being used for fact-checking.
翻译:自动假新闻检测模型表面上是基于逻辑的,标题中的一项主张的真相可以通过支持或反驳在最终的网上查询中发现的证据来确定。这些模型被认为是某种推理;然而,已经表明,同样的结果或更好的结果可以在不考虑任何索赔的情况下实现 -- -- 只有证据。这意味着其他信号包含在所审查的证据中,并且可以基于情绪、情绪或部分声音频率等可操纵的因素,这些因素很容易受到对抗性投入的影响。我们通过多种形式的神经和非神经预处理和风格传输来抵消其中一些信号,并发现这种非神经性指标的平缓化能够促使模型实际要求既要求要求要求索赔,又要求证据运行良好。我们最后用一种模型来构建一个从词汇上建立起来的情绪矢量,并通过一种“情感关注”机制传递给某些情绪以适当的重量。我们提供了可量化的结果,以证明我们的假设,即人造特征正在被用来进行事实检查。