Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual features. But reliability is not just what is said, but by whom and to whom. We additionally leverage on network information. Following the homophily principle, we hypothesize that users who interact are generally interested in similar topics and spreading similar kind of news, which in turn is generally reliable or not. We test several methods to learn representations of the social interactions within the cascades, combining them with deep neural language models in a Multi-Input (MI) framework. Keeping track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.
翻译:检测失实主题对于确保社交媒体的健康环境至关重要。我们利用在COVID-19大流行期间创建的数据集来解决这个问题。该数据集包含涉及弱标记为可靠或不可靠信息的推文级连,这些标记是根据信息来源的先前评估而确定的。鉴别不可靠主题的模型通常依赖文本特征。但是,可靠性不仅取决于所说的内容,还取决于说话者和听众。我们另外利用了网络信息。遵循同类相聚原则,我们推测,互动的用户通常对相似主题感兴趣,并传播类似的新闻,这些新闻通常是可靠的或不可靠的。我们测试了几种方法来学习级联内社交互动的表示,将它们与深度神经语言模型在多输入(MI)框架中进行组合。在跟踪时间内的互动序列的情况下,我们改善了之前最先进模型的表现。