Controversial content refers to any content that attracts both positive and negative feedback. Its automatic identification, especially on social media, is a challenging task as it should be done on a large number of continuously evolving posts, covering a large variety of topics. Most of the existing approaches rely on the graph structure of a topic-discussion and/or the content of messages. This paper proposes a controversy detection approach based on both graph structure of a discussion and text features. Our proposed approach relies on Graph Neural Network (gnn) to encode the graph representation (including its texts) in an embedding vector before performing a graph classification task. The latter will classify the post as controversial or not. Two controversy detection strategies are proposed. The first one is based on a hierarchical graph representation learning. Graph user nodes are embedded hierarchically and iteratively to compute the whole graph embedding vector. The second one is based on the attention mechanism, which allows each user node to give more or less importance to its neighbors when computing node embeddings. We conduct experiments to evaluate our approach using different real-world datasets. Conducted experiments show the positive impact of combining textual features and structural information in terms of performance.
翻译:争议内容是指吸引正面和负面反馈的任何内容。自动识别,特别是在社交媒体上,是一项具有挑战性的任务,因为它应该针对大量不断演变的、涉及众多主题的、涉及大量不同主题的职位进行,大多数现有方法依赖于专题讨论和/或信息内容的图形结构。本文根据讨论和文本特点的图形结构提出了争议检测方法。我们提议的方法依靠图形神经网络(gnn)将图形表达(包括其文本)编码成嵌入矢量,然后才执行图表分类任务。后者将把该员额分类为有争议的或非争议性。提出了两个争议检测战略。第一个办法是基于分级图表说明学习。图表用户的图表节点是按等级和迭代排列的,以配置整个图形嵌入矢量。第二个是基于关注机制,允许每个用户在计算不嵌入时对邻居给予多少或更少的重视。我们用不同的真实世界数据集来评估我们的方法。进行实验显示将文本特征和结构信息组合起来的积极影响。