Toxic Twitter账号1%的纵向研究 (A longitudinal study of the top 1% toxic Twitter profiles)

Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the problem of characterizing toxic content from a profile centric point of view. We study 143K Twitter profiles and focus on the behavior of the top 1 percent producers of toxic content on Twitter, based on toxicity scores of their tweets availed by Perspective API. With a total of 293M tweets, spanning 16 years of activity, the longitudinal data allow us to reconstruct the timelines of all profiles involved. We use these timelines to gauge the behavior of the most toxic Twitter profiles compared to the rest of the Twitter population. We study the pattern of tweet posting from highly toxic accounts, based on the frequency and how prolific they are, the nature of hashtags and URLs, profile metadata, and Botometer scores. We find that the highly toxic profiles post coherent and well articulated content, their tweets keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot like behavior, likely to have progenitors with intentions to influence, based on high fake followers score. Our work contributes insight into the top 1 percent of toxic profiles on Twitter and establishes the profile centric approach to investigate toxicity on Twitter to be beneficial.

翻译：社交网络，包括Twitter，充满了令人不安的内容。微软的Perspective API量化了各个推文的毒性分数。按照帕累托分布，绝大部分毒性内容由极少数的账号产生，因此分析和表征这些有毒账号至关重要。先前的研究主要集中于零星的、事件中心的有毒内容，以表征该平台上的毒性。与此不同，我们从账号中心的角度研究表征毒性内容的问题。我们研究了143k个Twitter账号，并专注于Twitter上生产小于1%毒性内容的账号的行为，基于因素API提供的推文毒性分数。以共293M个推文为样本，跨足16年的活动范围，这些纵向序列数据允许我们重构所有涉及账号的时间轴。我们使用这些时间轴来比较高毒性账号的行为和其他Twitter账号的行为的差异。我们分析了高度有毒账号的推文发布模式，基于发布的频率、推文数量、hashtag和URL的特性，账号元数据和Botometer分数。我们的研究发现，高度有毒账号发布的内容连贯、表达清晰，在主题上持续性地发帖，hashtag、URL和域名多样性低，主题上互相相似，并且有高概率表现出Bot行为，虚假粉丝得分较高，很可能有意影响他人。我们的工作为Twitter上毒性账号的1%提供了洞察，并表明从账号中心的视角研究Twitter上的毒性是有益的。