打破社区:利用Twitter上的文字挖掘和图表机学习,确定社区变化用户的特点 (Breaking the Communities: Characterizing community changing users using text mining and graph machine learning on Twitter)

Even though the Internet and social media have increased the amount of news and information people can consume, most users are only exposed to content that reinforces their positions and isolates them from other ideological communities. This environment has real consequences with great impact on our lives like severe political polarization, easy spread of fake news, political extremism, hate groups and the lack of enriching debates, among others. Therefore, encouraging conversations between different groups of users and breaking the closed community is of importance for healthy societies. In this paper, we characterize and study users who break their community on Twitter using natural language processing techniques and graph machine learning algorithms. In particular, we collected 9 million Twitter messages from 1.5 million users and constructed the retweet networks. We identified their communities and topics of discussion associated to them. With this data, we present a machine learning framework for social media users classification which detects "community breakers", i.e. users that swing from their closed community to another one. A feature importance analysis in three Twitter polarized political datasets showed that these users have low values of PageRank, suggesting that changes are driven because their messages have no response in their communities. This methodology also allowed us to identify their specific topics of interest, providing a fully characterization of this kind of users.

翻译：尽管互联网和社交媒体增加了人们可以消费的新闻和信息数量,但大多数用户只能接触强化其立场和将他们与其他意识形态社区隔绝的内容。这种环境给我们的生活带来巨大影响,例如严重的政治两极分化、虚假新闻的轻易传播、政治极端主义、仇恨团体以及缺乏丰富辩论等。因此,鼓励不同用户群体之间的对话和打破封闭社区对于健康社会非常重要。在本文中,我们描述和研究利用自然语言处理技术和图形机器学习算法在推特上打破社区的人。特别是,我们从150万用户那里收集了900万个推特信息,并建立了retweet网络。我们确认了他们的社区以及与他们相关的讨论主题。我们利用这些数据,为社会媒体用户分类提供了一个机器学习框架,以检测“社区断层者”,即从他们封闭社区向另一个社区摇动的用户。在三个推特极化政治数据集中进行的重要分析表明,这些用户的价值观较低,表明这些用户的改变是因为他们的信息在他们的社区里没有反应。这一方法还允许我们充分确定他们感兴趣的具体专题。