In recent years, social media has been criticized for yielding polarization. Identifying emerging disagreements and growing polarization is important for journalists to create alerts and provide more balanced coverage. While recent studies have shown the existence of polarization on social media, they primarily focused on limited topics such as politics with a large volume of data collected in the long term, especially over months or years. While these findings are helpful, they are too late to create an alert immediately. To address this gap, we develop a domain-agnostic mining method to identify polarized topics on Twitter in a short-term period, namely 12 hours. As a result, we find that daily Japanese news-related topics in early 2022 were polarized by 31.6\% within a 12-hour range. We also analyzed that they tend to construct information diffusion networks with a relatively high average degree, and half of the tweets are created by a relatively small number of people. However, it is very costly and impractical to collect a large volume of tweets daily on many topics and monitor the polarization due to the limitations of the Twitter API. To make it more cost-efficient, we also develop a prediction method using machine learning techniques to estimate the polarization level using randomly collected tweets leveraging the network information. Extensive experiments show a significant saving in collection costs compared to baseline methods. In particular, our approach achieves F-score of 0.85, requiring 4,000 tweets, 4x savings than the baseline. To the best of our knowledge, our work is the first to predict the polarization level of the topics with low-resource tweets. Our findings have profound implications for the news media, allowing journalists to detect and disseminate polarizing information quickly and efficiently.
翻译:近些年来,社会媒体被批评为极化。发现新出现的分歧和日益加剧的两极分化对于记者建立警示和提供更均衡的报道十分重要。虽然最近的研究表明社交媒体存在极化现象,但主要集中于有限的议题,如政治,长期收集了大量数据,特别是数月或数年的数据;虽然这些调查结果很有帮助,但现在还来得太晚了,无法立即发出警报。为了弥补这一差距,我们开发了一种域名式采矿方法,以便在短期(即12小时)的Twitter上找出极化议题。结果,我们发现2022年初日日日日文相关议题在12小时范围内出现极化现象。虽然最近的研究表明社交媒体存在极化现象,但它们主要侧重于政治,特别是长期收集了大量数据;虽然这些调查结果为时太晚,无法立即发出警报。然而,为了解决这一差距,我们开发了一种域域名流采矿方法,在短期(即12小时)内确定两极分化问题。结果,我们发现2022年初,日日日日日报相关专题的日报在12小时范围内被极化了31.6 ⁇ 。我们还分析它们往往以平均程度建立信息传播信息传播网络信息传播信息的双极化数据,其中一半是随机化数据,在通过随机化研究,在通过遥感数据库收集了我们最深层数据,从而获得最精确的轨道数据,从而以最精确的轨道数据,在通过甚高水平数据,从而利用最精确的计算数据,在遥感数据。在利用最精确的计算。在遥感数据,要用最精确的推算。