Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-chamber behavior and by an opinion polarity. Specifically, echo chambers are modeled as communities that are permeable to pieces of information with similar ideological polarity, and impermeable to information of opposed leaning: this allows discriminating echo chambers from communities that lack a clear ideological alignment. To learn the model parameters we propose a scalable, stochastic adaptation of the Generalized Expectation Maximization algorithm, that optimizes the joint likelihood of observing social connections and information propagation. Experiments on synthetic data show that our algorithm is able to correctly reconstruct ground-truth latent communities with their degree of echo-chamber behavior and opinion polarity. Experiments on real-world data about polarized social and political debates, such as the Brexit referendum or the COVID-19 vaccine campaign, confirm the effectiveness of our proposal in detecting echo chambers. Finally, we show how our model can improve accuracy in auxiliary predictive tasks, such as stance detection and prediction of future propagations.
翻译:尽管社交媒体的回声室一直受到相当严格的审查,但发现和分析回声室的一般模型却缺乏。在这项工作中,我们的目标是通过提出一个概率化的基因模型来填补这一差距,通过一系列潜在的社区来解释社交媒体足迹 -- -- 即社会网络结构和信息传播 -- -- 的概率化模型,这些模型的特点是一定程度的回声相声相声行为和观点极化。具体地说,回声室的模型是能够渗透到具有相近意识形态极性的信息碎片的社群,并且无法接触到相反倾斜的信息:这允许来自缺乏明确意识形态一致性的社区的歧视回声室。为了了解我们提出的对普遍期望最大化算法进行可伸缩、可随机化调整的模型参数,从而优化观察社会联系和信息传播的共同可能性。合成数据的实验表明,我们的算法能够正确地重建具有相近度的地心相色线潜伏社区,使其具有相近度和观点极性极性。在现实世界关于两极化的社会和政治辩论的数据的实验中,例如布雷思特公投案公投案或CVI-19号预测,我们如何在将来的预测中改进了我们的预测性研究室的定位。