False medical information on social media poses harm to people's health. While the need for biomedical fact-checking has been recognized in recent years, user-generated medical content has received comparably little attention. At the same time, models for other text genres might not be reusable, because the claims they have been trained with are substantially different. For instance, claims in the SciFact dataset are short and focused: "Side effects associated with antidepressants increases risk of stroke". In contrast, social media holds naturally-occurring claims, often embedded in additional context: "`If you take antidepressants like SSRIs, you could be at risk of a condition called serotonin syndrome' Serotonin syndrome nearly killed me in 2010. Had symptoms of stroke and seizure." This showcases the mismatch between real-world medical claims and the input that existing fact-checking systems expect. To make user-generated content checkable by existing models, we propose to reformulate the social-media input in such a way that the resulting claim mimics the claim characteristics in established datasets. To accomplish this, our method condenses the claim with the help of relational entity information and either compiles the claim out of an entity-relation-entity triple or extracts the shortest phrase that contains these elements. We show that the reformulated input improves the performance of various fact-checking models as opposed to checking the tweet text in its entirety.
翻译:有关社交媒体的虚假医疗信息对人们的健康造成了伤害。 虽然近年来人们已经认识到需要生物医学事实检查,但用户产生的医疗内容却很少得到相应的关注。 与此同时,其他文本类的模型可能无法再使用,因为经过培训的主张大不相同。例如,SciFact数据集中的主张很短而且重点突出:“与抗抑郁剂相关的副作用增加了中风的风险 ” 。 相反,社交媒体持有自然而然的主张,往往嵌入其他背景中:“如果你服用SSRIs等抗抑郁剂,你可能面临被称为serotonin综合症的症状的风险,2010年,Serotonin综合症的模型可能几乎让我死亡。 ”这显示了真实世界的医疗主张与现有事实检查系统所期望的投入之间的不匹配。 为了让用户生成的内容能够通过现有模式进行校验,我们建议重新配置社会媒体投入的方式,因此导致的主张会模仿既定数据集中的声称特征。为了完成这个条件,我们的方法可以使实体的精确性平面关系升级,我们用最短的文本来校正这些实体的精度要求。